In this post I will write an explanation of some work by ARC and other groups on provable guarantees for neural networks. My main goal in writing this is to personally understand what they are working on. But hopefully by the end I will have some comments and ideas for future work.
Background:
- âCompact Proofs of Model Performance via Mechanistic Interpretability"
- "max of k Heuristic Estimatorâ
These papers consider learning the following function:
In his writeup Gabe gives a description of what such a transformer looks like:
input
kxn one-hot encoded input- res = input * Embedding + Positional encoding
- attn_presoft = res * query * key transpose * res transpose
- attn_prob = masked softmax(attn_presoft)
- attn = attnprob * res * value matrix * output matrix
- logits = (res + attn) * U
- output = softmax(logits across appropriate dim)
So I think the basic idea of Gabeâs paper is as follows:
A priori you might think that in order to prove that the model is pretty good at outputting max
youâd have to just evaluate the model on all possible
But by using the heuristic of independence in a good place, Gabe is able to cut this cost down to more like
I think the âCompact Proofs of Model Performance via Mechanistic Interpretabilityâ paper maybe is actually able to give some provable guarantees. Like they give an efficient algorithm that has some error bound.
But ARC feels that such provable guarantees arenât really scalabe, whereas they think that the defeasible arguments are more scalable.
open questions from Gabe
- suprise accounting
- analog of wick propagation
- not really sure what this means yet â need to do some more reading.
Jacob talks about surprise accounting in his blog post. There are two parameters here
- How surprising is the explanation (i.e., how many bits does it take to specify)
- How surprising is the data given the explanation.