Recently (essentially ~80% of my waking moments for the past 3 days or something) I’ve been polishing up / completely rewriting a paper that I’m submitting to a conference this week. I think it’s quite appropriate that I write this post now rather than after I’ve finished editting the paper or after the paper is (hopefully) accepted at some conference.
So in this blog post I briefly share some learnings and contemplations about truth.
A good proof will give you an understanding of why something is true. A great proof however, should actually make you see that the proven statement is a tautology. Let me explain.
a breif classification of theorems into three classes
There are a lot of different states of being with respect to mathematical propositions.
inaccessible axioms
“The matrix multiplication constant \(\omega < 2.4\)” This is a statement that I believe because it is universally accepted in a community that I trust. I have some vague conception of how this could be possible: Strassen’s algorithm demonstrates that you can do MM in \(n^{\log_2(7)}< n^{2.81}\) and I can imagine that more complicated tensor decompositions allow one to improve over this, even if I only have the vaguest idea of what the “laser method” is.
This is a prototypical example of a result that is incredibly complicated to the point that I don’t really have much desire to understand it except on a very high level. Some other theorems that probably fall in this boat include 4-color theorem. Robertson Seymour Forbidden Graph minors theorem. These theorem’s are characterized by being frequently useful, but with hopelessly long proofs. Let’s call these theorems inaccessible axioms.
simple axioms
Take something like “The multiplicative group \(Z_{p^{k}}^{\times}\) is cyclic”. I have no qualms using this fact. I even have written down a proof of this fact. But it doesn’t feel obvious. It is a fundamental part of my paradigm for thinking about \(\mathbb{F}_p\). But usually I’m just taking it as an axiom, that I believe because I tried it for \(p<12\) and it held there, and because everyone else believes it, and because I feel like I probably at some point had checked a proof of this. The proof is really not so complicated, I’m in danger of understanding it right now! The proof (for the prime case not the prime power case) simply stems from a bijection showing that \(\sum_{d\mid n} \phi(d) = n\). The bijection is simply: write down reduced fractions \(x/n\) for each \(x\in [n]\) and then group them by their denominator. For example, here is a proof that \(\mathbb{Z}_7^*\) is cyclic: \[(1/6, 5/6), (1/3 ,2/3), (1/2), (1/1) \iff \phi(6) = 2, \phi(3)=2, \phi(2)=1, \phi(1)=1\] Why is this a proof? Well, The number of elements of order \(d\mid 6\) in \(\mathbb{Z}_7^{*}\) is at most \(\phi(d)\). Let \(x_d\) denote the number of elements of order \(d\) for each \(d\mid 6\). Then we have \(6 =x_1+x_2+x_3+x_6\). And the RHS is upper bounded by \(6\), so in order for equality to hold we simply must have \(x_6 = \phi(6)\). And that’s why I’m not so surprised to find that \(3,4\) are generators for \(\mathbb{Z}_7^*\). So, after writing this, I have realized: this proof consists of two simple lemmas.
- Lemma 1: \(\sum_{d\mid n} \phi(d) = n\). There’s a nice bijection that explains this (see above).
- Lemma 2: \(|\mathbb{Z}_p^*| \le \sum_{d\mid n}\#\text{elements of order } d\)
- Lemma 3: For each \(d\mid n\) there are at most \(\phi(d)\) elements of order \(d\).
- Lemma 0: I guess technically I’m invoking Lagrange’s theorem implicitly to assert that all elements of \(\mathbb{Z}_p^{*}\) have to have order \(d\mid |\mathbb{Z}_p^{*}|\).
Ok so actually I think I learned something just while writing the above text. This second type of theorem that I wanted to taxonomize I shall call simple axioms. These theorems are characterized by having fairly simple proofs that are easy to bring into cache. Once brought into cache they should hopefully seem obvious. But they usually live outside of cache.
tautological axioms
Finally there is what I will call “tautological axioms”. As an example, consider something like the Cauchy-Shwarz inequality. My friend Peter has expressed the opinion “Cauchy-Schwarz is anything equivalent to \(x^2\ge 0\)”. This highlights the fact that sometimes you can use Cauchy-Shwarz without even realizing it. The Cauchy-Scharz inequality really just says “if you have some budget that you can split amongst two numbers and then you’re going to square and combine the numbers, then the smallest possible result is if you split the budget evenly”. For instance \(3^2+7^2 \ge 5^2 + 5^2.\) This is the type of theorem that the “tagline” when I’m retrieving it from memory includes the proof. This would be like including the proof of a statement in the title (or at least description) of a blog post! Oh gosh I have to do that sometime now.
back to discussion of writing a research paper
So, back to the discussion of writing a research paper. I have historically always complained a bit about having to edit my research papers to be correct. “It’s the ideas that are important, people can survive a few typos. It’s good for there to be typos, it’s like eating dirt it keeps your immune system strong, keeps you on your toes”.
Ok, maybe I wasn’t that bad, but you get the point.
I think I’m starting to change my appraisal of the editting process of a paper in a positive and healthy direction, that is towards seeing it as a highly valuable activity and not being upset with myself that my work would contain errors that need to be fixed.
My usual approach for editting a paper is I go through and kind of simultaneously am in proof reading and fixing mistakes mode. However, this time through I’ve kind of switched to a slightly different model: first go through just in proof reading mode, and notice all the errors. Then go back later and fix the errors. I think this approach is good. My brain doesn’t have the RAM to run its error-finding software and its error-fixing software simultaneously. Or maybe these programs actually interfere with the operation of each other. Anyways.
We come to the question that inspired this blog post:
How much error can a proof contain before it ceases to be truth?
The answer may surprise you.
The answer is zero. Cero, Zéro, Null, Zero, Zero, Ноль, 零, ゼロ, صفر, शून्य, 제로, Sıfır, Nul, Noll
Now, let’s be clear. This doesn’t mean that your paper is garbage if it has an unclear sentence.
But it does mean that the responsibility I accept when writing a paper is to make it error-free. Think about the last time you read something and it had an error in it. It can make it 10x harder to understand the paper and destroy your confidence in it.
Oft-times this level of editting involves seemingly silly things. Getting the floors, ceilings and edge cases right. But I think there’s no point in calling these things silly. Sometimes these problems even are real problems. Usually fixing them helps elucidate why an argument is actually working. There is a very beautiful aesthetic to an argument where you can clearly see where all the hypotheses are used and how the argument fits together.
Sometimes it’s ok to not spell out details in the text. But it’s never ok to have not even spelled out the details yourself. If you don’t even care about the work enough to carefully read it, why should anyone else?
As a side note, I think Terry Tao gives some good advice on how to write papers.
How much do you have to proof-read / edit a paper? Well, Bradon Sanderson takes a year to edit a book, including four drafts, and has a huge team that helps him check for problems. That kind of makes editting my little 10-20 page manuscript seem more doable.
Alek’s Rule of Truth states that “A paper is not true until you have read it without finding any errors. Preferably this reading happens at least a week after writing it.”
Alek’s Rule of Clarity states that a paper is not clear unless someone who you haven’t already told all about the problem has read it and could understand it.
A final comment or two on writing papers which I hope my future self will read before the next time he writes a paper.
It’s really important to have good organization. Optimal lemma length is like one page. If a proof starts to get longer than that it’s really hard to keep it all in memory.
Splitting description into named paragraphs is a good idea. In the future I prefer avoiding the “3 lemmas nested in a theorem” style and going for the first prove 3 lemmas and then derive the theorem style. Within a proof you should start with a statement of the plan for the proof, and then at every paragraph it should be clear how you are advancing the plan.
So this has actually ended up being a rather long bblog post. And I’m sure it contains some mistakes. And it maybe not super well-organized. Luckily I’m not publishing it in a journal. In the blog medium I don’t have a promise to the reader to have achieved any of these things. In fact, maybe the opposite: I promise this blog post contains errors and was just writen stream-of-conciousness and then never editted.
But anyways maybe the key points are as follows:
- Editting a paper to be true is a vital part of doing research.
- Editting a paper results in clearer simpler ideas.
- Editting a paper makes it so that other people (including future you!) can read your paper.
- In a well organized paper it is clear how the logic flows (this serves to illuminate potential flaws in the logic).
- Check edge cases. Run the argument on simple examples. At each line ask “is this really true?” and actually check.
- Have a separate “error-finding” mode where you assume the paper is false and try to find the error.
- Editting a paper makes something beautiful and helps you appreciate the results more.
Ultimately, you are finished writing a paper when you finally realize “ah it really is true”.