2017 September 20,
Carleton College, Prof. Joshua R. Davis
Complete these problems. Write them up carefully, in the order assigned, for handing in with the rest of your homework.
In class we discussed a one-word Bayesian spam filter based on the equation
P(S | W) = P(W | S) P(S) / [P(W | S) P(S) + P(W | Sc) P(Sc)].
Now let's design a two-word spam filter. Let W1 be the event that a message contains one suspicious word ("Rolex") and W2 the event that it contains a different suspicious word ("refinance"). To streamline the problem, we impose a simplifying "independence" assumption, that
P(W1 W2 | S) = P(W1 | S) P(W2 | S) and P(W1 W2 | Sc) = P(W1 | Sc) P(W2 | Sc).
Show that P(S | W1 W2) = x / [x + y], where
x = P(W1 | S) P(W2 | S) P(S) and y = P(W1 | Sc) P(W2 | Sc) P(Sc).
Finally, derive an analogous n-word spam filter based on P(S | W1 W2 ... Wn) and a suitable independence assumption.