2017 September 20,

Math 265: Day 4

Carleton College, Prof. Joshua R. Davis

Due at the start of class on Day 8

Complete these problems. Write them up carefully, in the order assigned, for handing in with the rest of your homework.

In class we discussed a one-word Bayesian spam filter based on the equation

P(S | W) = P(W | S) P(S) / [P(W | S) P(S) + P(W | Sc) P(Sc)].

Now let's design a two-word spam filter. Let W1 be the event that a message contains one suspicious word ("Rolex") and W2 the event that it contains a different suspicious word ("refinance"). To streamline the problem, we impose a simplifying "independence" assumption, that

P(W1 W2 | S) = P(W1 | S) P(W2 | S) and P(W1 W2 | Sc) = P(W1 | Sc) P(W2 | Sc).

Show that P(S | W1 W2) = x / [x + y], where

x = P(W1 | S) P(W2 | S) P(S) and y = P(W1 | Sc) P(W2 | Sc) P(Sc).

Finally, derive an analogous n-word spam filter based on P(S | W1 W2 ... Wn) and a suitable independence assumption.