skyspace

Shatar: Hey JJ what’s cooking?
JJ: At this point in time I personally am not cooking anything. However I intuit that your meaning is more metaphorical, similar to the phrase “what’s up”. Thanks much for your inquiry. Lately I’ve been studying boolean functions, specifically Chapter 2 of O’Donnel’s book which discusses social choice theory.
Shatar: Oh no way! Wait I actually love that topic. You know the best voting rule?
JJ: The answer to your question depends on the metric you use to evaluate the voting rule.
Shatar: The metric is obviously coolness and makes the most sense. Anyways, here’s probably the best voting rule:

You have a bunch of little groups, called “tribes” of people. Each of these groups is debating whether or not to go feed this cat that they saw sitting out in the rain. Within each tribe here is what happens: either, everyone unanimously votes in favor of helping the cat, in which case the tribe will go help the cat / elect the cat as president. However, if anyone in the group opposes the decision to help the cat the group will just argue about philosophy and ignore the cat. The cat is elected as president / saved off the side of the road iff at least one tribe decides to help it.

JJ: Ah yes this is called the “tribes” voting rule.
Shatar: It’s so logical, right! Anyways, one reason why it’s cool is that it apparently gives essentially the smallest possible influence to each person.

Recall:

Definition. \[I_i[f] = \Pr_x[f(x)\neq f(x\cdot e_i)] = \Theta(1)\cdot ||\partial_i f||_2^{2}.\]

Remark. In the following theorem it is actually important what base the logs are in the definition of the size of the tribes! So I’ve written it explicitly.

I feel like there are a couple times lately when log bases are important. E.g., \(\ln (1+x)\le x\). Whatever.

Theorem. If you set \(w = \log_2 (n/\log_2 n)\) and \(s = n/w\) then TRIBES with \(s\) size-\(w\) tribes is a balanced function that gives each coordinate influence only \[\mathcal{O}(1)\cdot \frac{\log n}{n}\]

Proof.

First, we show it’s balanced. The chance that no-one helps the cat is \[(1-2^{-w})^{s} \approx (1 - \frac{\log n}{n})^{n/\log n} \approx 1/2.\]

Next, we compute the influence of each variable. It is: \[2(1-2^{-w})^{s-1}2^{-(w-1)}\approx 1/2^{w} = \frac{\log n}{n}.\]

JJ: That’s pretty interesting. Thanks for telling me about it!
Shatar: What’s your vote?

Now I just review several things from the first two weeks of class. Reference: Dor Minzer’s lecture notes.

Remark. Random resetrictions are important. Here are some of their crucial properties:

For any \(S\subseteq \hat{J}\), \[\widehat{f_{J\to y}}(S) = \sum_{T\subseteq J} \hat{f}(S\sqcup T) \chi_T(y).\] That is, to get the fourier coefficient at \(S\) you smear the fourier coefficients at \(S\sqcup T\) for all \(T\) in the subset of indices that we fixed. Specifically when doing this smearing it is weighted based on the \(T\) character evaluated at the value we restrict to.

One consequence of this fact is that: for any \(S\subseteq \hat{J}\), \[\mathop{\mathrm{\mathbb{E}}}_y \widehat{f}_{J\to y} (S) = \hat{f}(S).\] That is, the expectation of the character is just the character.

Finally, we have the following equation for the “second moment”: \[\mathop{\mathrm{\mathbb{E}}}_y \widehat{f_{J\to y}}(S)^2 = \sum_{T\subseteq \overline{J}} \hat{f}(S\sqcup T)^2.\]

Intuition is

decrease degree of polynomial
make some functions like AND drop out to constants
in general simplify your function

Lemma. Given \(T\subseteq J \subseteq [n]\) and membership queries to boolean function \(f\) you can, with good probability, approximate \[\sum_{S: S\cap J = T} \hat{f}(S)^2.\]

Definition. Say that a function \(f\) is \(t\)-sparse if its fourier spectrum \(\hat{f}\) is supported on at most \(t\) characters.

Say \(f,g\) are \(\varepsilon\)-close if \(||f-g||_2^{2}\le \varepsilon\).

Theorem. Given a function which is \(\varepsilon\)-close to being \(t\)-sparse, we can, with good probability, efficiently learn a function \(g\) that is close to \(f\).

Proof. Suffices to focus on heavy coefficients.

Key step: using the key lemma we can estimate, e.g., \[\sum_{S: S\cap \left\{ 1\right\}=\left\{ 1\right\}} \hat{f}(S)^2.\] If it’s big it means that there must be some heavy fourier coefficient containing \(1\). We build a tree and find everything.