In this post I once again remind myself what EM is. It seems like a really cool idea, but it hasn’t totally stuck yet.
A simple example:
- Suppose we have some labelled data
, is a feature vector is a class. - We might try logistic regression.
- This means, finding
which minimizes the following expression:
- Unfortunately no such model fits the data.
- There’s a twist--- the actual model that the data was generated from is as follows:
- There is a hidden quantity
- There are vectors
- The data is actually generated by first choosing
- and then based on
choosing as Bernoulli - with parameter
.
- There is a hidden quantity
Hmm. Now how could we do this. We will do EM algorithm. Which is great for this kind of hidden features setup.
Define pseudo log likelihood
Rewriting this a bit:
In other words:
Clearly, the pseudo log likelihood is leq to actual log likelihood, with equality when
Another perspective:
why EM works:
We can take the pseudo log likelihood thing and drop a constant that doesn’t matter for the optimization.
Alg is now:
Take
Gaussian mixture model:
ok i dont really have time to read this rn read till page 10/14