Worksheet: Naïve Bayes

Part 1 - Catagorical Naïve Bayes

In the first part, we are going to construct a simple naïve Bayes classifier. Suppose we have two binary random variables $X_1\in\{0,1\}$ and $X_2\in\{0,1\}$, and a corresponding binary class label $C\in\{0,1\}$. The variables are distributed by the joint probability table shown in Table 1. We want the classifier to predict the most likely class $C=c$ for a given combination of $X_1=x_1$ and $X_2=x_2$. Use the table below to answer the next questions.
Table 1: Joint Probability Table
$C=0$$C=1$
$X_1=0$$X_1=1$$X_1=0$$X_1=1$
$X_2=0$0.120.030.180.27
$X_2=1$0.280.070.020.03

Question 1

Calculate the conditional probabilities of $P(C|X_1=1,X_2=0)$ in the table below directly from Table 1.

$X_1=1,X_2=0$
$C=0$
$C=1$

Question 2

Which of the following derivations of our naïve Bayes classifier are correct?




Question 3

Fill out the marginal probabilities of $P(C)$ in the table below based on Table 1.

$C=0$$C=1$

Question 4

Fill out the conditional probabilities of $P(X_1=1|C)$ in the table below based on Table 1.

$C=0$$C=1$
$X_1=1$

Question 5

Fill out the conditional probabilities of $P(X_2=0|C)$ in the table below based on Table 1.

$C=0$$C=1$
$X_2=0$

Question 6

Calculate the conditional probabilities of $P(C|X_1=1,X_2=0)$ in the table below according to naïve Bayes.

$X_1=1,X_2=0$
$C=0$
$C=1$

Question 7

What is the most likely class for $X_1=1$ and $X_2=0$ according to our naïve Bayes classifier?


Question 8

Does the conditional independence assumption of naïve Bayes hold in this example?


Part 2 - Gaussian Naïve Bayes

In the second part, we are going to construct a naïve Bayes classifier on a set of continous random variables. Our setting is the same as in part 1, except now, $X_1\in\mathbb{R}$ and $X_2\in\mathbb{R}$ are continous random variables. Use the dataset below to answer the next questions.
Table 2: Continous Dataset
$X_1$$X_2$$C$
0.20.50
0.40.10
0.60.90
0.80.61
0.70.41
0.60.81

Question 9

Fill out the marginal probabilities of $P(C)$ in the table below based on Table 2.

$C=0$$C=1$

Question 10

Assume that $X_1|C=c\sim\mathcal{N}(\mu^c_1,\sigma^c_1)$ follows a Gaussian distribution with mean $\mu^c_1$ and standard deviation $\sigma^c_1$. Estimate the means and standard deviations of this distribution in the table below based on Table 2.

$c=0$$c=1$
$\hat{\mu}^c_1$
$\hat{\sigma}^c_1$

Question 11

Now, do the same for $X_2|C=c\sim\mathcal{N}(\mu^c_2,\sigma^c_2)$.

$c=0$$c=1$
$\hat{\mu}^c_2$
$\hat{\sigma}^c_2$

Question 12

Fill out the conditional probability densities of $f(X_1=0.9|C)$ in the table below based on distribution parameters that you have estimated in question 10. (Hint: use software to calculate the densities, e.g. R or Python).

$C=0$$C=1$
$X_1=0.9$

Question 13

Fill out the conditional probability densities of $f(X_2=0.2|C)$ in the table below based on distribution parameters that you have estimated in question 11.

$C=0$$C=1$
$X_2=0.2$

Question 14

What is the most likely class for the combination $X_1=0.9$ and $X_2=0.2$ according to our naïve Bayes classifier?