In this post I once again remind myself what EM is. It seems like a really cool idea, but it hasn’t totally stuck yet.

A simple example:

  • Suppose we have some labelled data , is a feature vector is a class.
  • We might try logistic regression.
  • This means, finding which minimizes the following expression:
  • Unfortunately no such model fits the data.
  • There’s a twist--- the actual model that the data was generated from is as follows:
    • There is a hidden quantity
    • There are vectors
    • The data is actually generated by first choosing
    • and then based on choosing as Bernoulli
    • with parameter .

Hmm. Now how could we do this. We will do EM algorithm. Which is great for this kind of hidden features setup.

Define pseudo log likelihood

Rewriting this a bit:

In other words:

Clearly, the pseudo log likelihood is leq to actual log likelihood, with equality when is actually correct.

Another perspective:


why EM works:


We can take the pseudo log likelihood thing and drop a constant that doesn’t matter for the optimization.

Alg is now:

Take


Gaussian mixture model:

.

ok i dont really have time to read this rn read till page 10/14