I’m tentatively planning to grind through MIT’s “inference” courses.
Specifically, this is
- algorithms for inference
- inference and information
These cover some things like
- graphical models
- information theory
- inference
- information geometry
- thinking about structure of random variables and independence and stuff
- probably other stuff too
These topics seem
- pretty intellectually interesting
- the most relevant stuff to ARC’s research agenda (where I’ll be interning over the summer)
I reserve the right to quit if I find some cool research projects / make up some cool research questions myself. However I don’t anticipate quitting currently.
I’ll record my journey of doing this here. I’ll also tentatively plan to make this into a dialogue because I love JJ+shatar+blobby, but right now I just need to write some stuff down.
Day 0 — Dec 12: Likelihood ratio test: If you have two hypothesis, and want to select one to minimize the probability that you select the wrong one, then you should compute Pr(hypothesis X is true given the data) and choose whichever hypothesis maximizes this chance. This assumes that you have some priors on the chance of the hypothesis being correct, and that under each hypothesis you can compute Pr(data).
Day 1 — Dec 13 Read the first couple Alg for Inf notes. First some definitions, then a neat theorem.
Directed Graphical Model
In a directed graphical model, we can factor the probability distribution into terms of the form
But my digraph implies more independences. For instance,
The general hope for this class is that if we have a sparse graph then we can do much much more efficient things than are possible in the dense case.
A general algorithm for listing off some conditional independences is:
- topo-sort the DAG
is independent of non-parent things before it in the topo-order, conditional on ‘s parents
Bayes Ball Algorithm There’s this pretty handy trick for checking whether a conditional independence relation is implied by a directed graphpical model — it’s called the bouncy ball method.
The method answers the question is
- Shade the variables
. - Put a ball at all vertices in
. - Bounce the balls.
- If a ball ever hits
, then the conditional independence relation is false. Otherwise it is true.
The bouncing is unfortunately a little bit counter-intuitive — it’s not just “when you hit a shaded circle, bounce backwards”. Instead there are three cases:
- Bounce if shaded on chain. else pass.
- Bounce if shaded up tree. else pass
- Pass if shaded in vee. else bounce. The only way I can think of to remember this is to keep these three examples in mind.
Undirected graphical model
Vertices for vars.
Cond. indep
This is great and intuitive.
Theorem (Ham-Clifford)
(1) If
where
(2) If
Proof (2) is a straightforward computation. (1) is quite tricky to prove — I’ll sketch pf below.
We just prove for binary random vars. Probably could generalize.
If rvs are binary, say
It turns out this function has some nice properties. For instance,
Now we’re going to show that
To understand why, let’s just consider a graph on vertices
By the magic of conditional indepdence we have:
Hence
AND
So,
So
Factor Graph
- bipartite
- one side corresponds to vars, the other side is ways to combine the vars.
If the LHS has the vars
, and the RHS has vertices then the pr dist must factor as
where
remark: sometimes the models can’t express all relevant properties of your prdist
e.g., directed models seem to be super lossy about preserving conditional independence relations!
Q: can you convert btwn models? A: sometimes.
Day 2 — Dec 17 Wrote up day 1.
Did some of the psets. Note to self — only do the interesting problems: my goal is to be exposed to the subject and understand the basic ideas. If something seems like a technical not interesting or intuitively useful idea, ignore it.
some hw qs
Q1. Undirected graphical model for sampling an independent set according to
? A: the graph is the graphical model.
Q2. Undirected graphical model for sampling a matching according to
? A: orig graph is . Make new graph on the edges where
Day 3 — Dec 18 Talked more about converting between directed and undirected graphical models. Talked about conditions for perfect conversion. Talked about “minimal I maps” i.e., converting to a different graphical model which doens’t imply any false structure but would if you removed any edges.
main takeaways:
- digraph to undigraph conversion is perfect if moralization (connecting parents) doesn’t add edges.
- undigraph has a perfect digraph model iff is “chordal” i.e., all cycles of len > 3 have a chord.
Lec 6 — Gaussian Graphical Models Gaussian:
for iid Gaussians is Gaussian for all - PDF =
covariance rep - could also write PDF as
information rep
Interesting fact:
Talked about how to marginalize and condition wrt covariance rep and information rep.
x_{i+1} = Ax_i + Bv_i.
p_{x_{1}}(x_{1}) = \sum_{x_{2},\dots,x_n} p(x).