disclaimer: sorry this blog post is a bit of a mess, and there are some mistakes.

We start with a couple simple ad-hoc proofs for some simple cases of Dirichlet’s theorem. They are very natural, but don’t generalize to the result we want, and don’t give you any type of density result.

Simple proofs that \(4\mathbb{N}+1, 4\mathbb{N}+3\) contain infinitely many primes

The following two proofs are taken from “Problems from the Book” by Titu Andreescu and Gabriel Dospinescu, but are standard.

Assume that \(p_1,\ldots, p_n\) is a complete list of the primes congruent to \(3\) modulo \(4\), excluding \(3\). Then \[ 4 p_1\cdots p_n + 3 \] must be divisible by some prime congruent to \(3\) mod \(4\). Of course \(p_1\cdots p_n\) is not divisible by \(3\) so \(4p_1\cdots p_n + 3\) isn’t either. Thus the product must be divisible by one of \(p_1,\ldots, p_n\). But this is impossible by Euclid’s algorithm.

Assume \(q_1,\ldots, q_n\) complete list of primes congruent to \(1\) modulo \(4\). Then, \[ (q_1 q_2 \cdots q_n)^2 +1 \] is not divisible by any \(P_3\) primes, and is divisible by \(2\) only once. Thus it should be divisible by \(q_i\), which is impossible by Euclid’s algorithm.

These proofs are very elementary, although the second proof already relies on a non-trivial fact about the classification of prime factors of numbers of the form \(n^2+1\). However, they are fairly ad-hoc. Furthermore, these proofs seem to have no hope of proving a density statement.

It turns out that the following proof generalizes more nicely, and also gives you density statements.

The Euler Product Formula states that for any real \(s>1\) the following series/products exist and are equal. \[ \sum_{n\ge 1} \frac{1}{n^s} = \prod_p \frac{1}{1-p^{-s}} \] It is implicit throughout this note that \(p\) runs over primes when written as an index in a sum or product. The veracity of the Euler Product Formula is immediate from the Fundamenal Theorem of Numbers – the fact that all numbers can be uniquely decomposed into prime factors – along with viewing \(\frac{1}{1-p^{-s}}\) as the sum \(\sum_{n\ge 0} p^{-ns}\), and noting that for absolutely convergent sums it is legal to rearrange terms arbitrarily. Of course we know that \[ \lim_{s\to 1} \sum_{n\ge 1} n^{-s} = \infty. \] This fact is incompatible with having finitely many primes: the product on the RHS can’t grow unboundedly if it is a product over finitely many terms.

Simple Euler Products

We have seen an intruiging proof of the infinitude of the primes. We now extend this to proving that there are infinitely many primes in \(4\mathbb{N}+1\) and \(4\mathbb{N}+3\).

The following proofs are taken from Andreescu and Dospinescu’s book again, but should probably be attributed to Dirichlet.

Let \(\lambda\) be the character with \(\lambda(2k) = 0, \lambda(4k+1)=1, \lambda(4k+3)=0\). Define \[L(s, \lambda) = \sum_{n\ge 1} \frac{\lambda(n)}{n^{s}}.\] For any \(s>1\) we have absolute convergence, and thus can re-arrange terms to achieve: \[ \ln L(s) = \sum_p \left(1-\frac{\lambda(p)}{p^{s}}\right).\] Using basic taylor series one can show \[ \left|\ln L(s) - \sum_p \lambda(p)/p^{s}\right| \le O(1). \] From here we derive that \[ \left|\sum_{p} \lambda(p)/p^{s}\right| = O(1). \] On the other hand, we know that \[ \lim_{s\to 1} \sum_p p^{-s} \to \infty. \] Thus, there must be a large amount of cancellation occuring in the sum \(\sum_{p} \lambda(p)/p^{s}\). In particular, we must have that there are infinitely many primes in both \(4\mathbb{N}+1\) and \(4\mathbb{N}+3\).

Dirichlet Characters

To generalize this idea we need Dirichlet Characters. Credit to this is due to this paper by Thai Pham, a previous MIT student. The following resource was also useful: Garrett, Paul, Primes in arithmetic progressions, 2011. http : //www.math.umn.edu/ ∼ garrett/m/mf ms/notes c/dirichlet.pdf as was wikipedia for complex analysis background.

For an abelian group \(G\) we define the dual \(\hat{G}\) as the set of MULTIPLICATIVE homomorphisms from \(G\) to \(\mathbb{C}\setminus\left\{ 0\right\}\). So we would have \[ \chi(a)\chi(b) = \chi(ab). \]

It turns out that the following thing is true:

Theorem. \(\hat{G}\) is isomorphic to \(G\).

Proof. This is clear for cyclic groups and all finite abelian groups can be written as a direct sum of cyclic groups.

We will only consider the case \(G=(\mathbb{Z}/mZ)^{*}\). In this case the elements of \(G\) are in correspondence with \(\phi(m)\)-th roots of unity.

BUT THEY ARE NOT THE ADDITIVE CHARACTERS. I DONT REALLY KNOW WHAT THEY ARE.

A useful property of the dual group is:

Lemma. We have:

  • \(\sum_{g\in G}\chi(g) = 0\) unless \(g=1\) in which case the sum is \(|G|\)
  • \(\sum_{\chi\in \hat{G}} \chi(g) = 0\) unless \(\chi=id\) in which case the sum is \(|\hat{G}|\).
  • \(\langle \chi, \phi \rangle = \sum_{g} \chi(g)\overline{\phi(g)} = \mathbb{1}_{\chi=\phi}.\)

These are standard and we omit the proofs.

Dirichlet Series

Now we proceed with the proof of the theorem. Throughout the section we fix \(m\). Our goal will be to show that for every \(a\perp m\) the set \(m\mathbb{N}+a\) contains infinitely many primes. The idea of the theorem is to assert the divergence of a certain sum. The following result will be helpful:

Theorem. Let \(D(s) = \sum_{n\ge 1} a_n/n^{s}\). Suppose that \(D(s_0)\) converges. Then \(D\) is analytic on \(\mathrm{Re}(s)>\mathrm{Re}(s_0)\).

Proof. This is basically just saying that \(s\mapsto \exp(s)\) is complex differentiable and that convergent sums play nicely with differentiation.

We will also need the Dirichlet \(L\)-series, defined as \[ L(s, \chi) = \sum_{n\ge 1} \frac{\chi(n)}{n^{s}}. \] The \(L\)-series obey the following property:

Lemma. Fix any non-trivial \(\chi\). Then, \(L(s,\chi)\) converges and is analytic in \(\mathrm{Re}(s)>0\).

Furthermore, if \(\mathrm{Re}(s)>1\) then \(L(s,\chi)\) converges absolutely, and satisfies the product formula \[ L(s, \chi) = \prod_{p\perp m} \frac{1}{1-\chi(p)p^{-s}}. \]

Proof. This follows by Abel’s summation formula and the fact that if you sum all the \(\phi(m)\)-th roots of unity then you get \(0\).

If \(\mathrm{Re}(s)>1\) then the sum is absolutely convergent, allowing us to arbitrarily re-arrange the terms.

A key character in the proof will be the Reimann Zeta function. This is defined as follows.

Lemma. Let \(\zeta(s)\) denote the analytic extension of \(\sum_{n\ge 1}n^{-s}\) for \(\mathrm{Re}(s)>1\) to a meromomorphic function on the whole complex plane. This extension exists and is holomorphic except for a pole at \(s=1\).

Proof. The typical way that analytic extensions work is by having a functional equation that extends the definition of your function. We will actually only require the Reimann zeta function on \(\mathrm{Re}(s)>0\), for which we can perform the extension using the following simple observation: \[\sum_{n\ge 1} n^{-s} - 2 (2n)^{-s} = \zeta(s)\left(1 - 1/2^{s-1}\right) = \sum_{n\ge 1} (-1)^{n-1}/n^{s}. \] The RHS converges for all \(s\) with \(\mathrm{Re}(s)>0\). Thus, for \(s\in \mathbb{C}\) with \(\mathrm{Re}(s)>0\) and \(s\neq 1\) we can define \[ \zeta(s) = \frac{1}{1-2^{1-s}} \sum_{n\ge 1} \frac{(-1)^{n-1}}{n^{s}}.\] edit: this makes no sense as an analytic continuation of the zeta function what I wrote down does not even converge nad is not well defined

please see this resource for how to actually analytically continue the reimann zeta function

Corollary. Let \(\chi_0\) denote the trivial character. Then \(L(s,\chi_0)\) extends to a meromorphic function in \(\mathrm{Re}(s)>0\) with the only pole at \(s=1\).

Proof. This follows immediately from the similar statement about \(\zeta(s)\).

Lemma. \[ \left| \ln L(s, \chi) - \sum_p \frac{\chi(p)}{p^{s}}\right| \le 1618.\]

Proof. For any \(\mathrm{Re}(s)>1\) we have \[ \log L(s, \chi) = -\sum_p \log(1 - \chi(p)p^{-s}). \] Then, using the same Taylor Series tricks that we did in the simple \(m=4\) case we conclude the lemma.

Now we use the orthogonality relations developed for the characters to extract the sum that we want:

Lemma. For any \(a\perp m\), and all \(\mathrm{Re}(s)>1\), \[ \sum_{p \equiv a \mod m} p^{-s} \ge \frac{1}{\phi(m)} \sum_\chi \overline{\chi(a)}\log L(s,\chi) - 100m.\]

Proof. We apply the orthogonality relations to the previous lemma.

Our goal is to show that the sum on the LHS of the above lemma diverges as \(s\to 1\). We already know that \(L(s, \chi_0)\to \infty\) where \(\chi_0\) is the trivial character. Thus, all that remains is to prove that for any \(\chi\neq \chi_0\), the positive contribution from \(L(s, \chi_0)\) is not cancelled out by \(L(s, \chi)\). We already know that these \(L(s,\chi)\) are convergent for \(s=1\). However, we are taking \(\log L(s,\chi)\), so it would still be problematic if \(L(s,\chi)=0\). So, to finish, we will establish that for all \(\chi\neq \chi_0\) we have \(L(1,\chi)\neq 0\).

ok interesting note: we are considering \(\log\) as a function on \(\mathbb{C}\). gosh complex analysis is weird. In any case, according to chatgpt \(\log(z)\) is defined as long as \(z\neq 0\).

ok, side question, why was taking logarithms even a good move? I mean I guess estimating sums was easier than estimating products.

It seems like a somewhat strange concern. I would have guessed that one can make a simple argument, just grouping the terms of the sum appropriately, to show boundedness away from zero. But that does seem a bit gross. Anyways, to finish the proof we will prove this non-vanishing.

Non-vanishing of non-trivial characters at \(s=1\).

We define the function \[ \zeta_m(s) = \prod_\chi L(s, \chi). \]

The proof strategy is as follows:

Remark. [Note that while Thai Pham’s notes are mostly excellent, I think he has some errors with regards to the proof strategy here.]

We know that for \(\chi\neq \chi_0\), \(L(s, \chi)\) is analytic on all \(\mathrm{Re}(s)>0\). We know that \(L(s,\chi_0)\) is analytic on \(\mathrm{Re}(s>0)\) with a single pole at \(s=1\). Suppose that \(L(1,\chi)=0\) for some \(\chi\neq \chi_0\). Then we \(\zeta_m(s)\) would have no poles on \(\mathrm{Re}(s>0)\) because the zero would cancel the single potential pole. We want to show that this cannot happen. We will assume for contradiction that \(\zeta_m\) actually doesn’t have any poles. But then there is a theorem that will allow us to evaluate \(\zeta_m(s)\) at \(s\) with \(\mathrm{Re}(s)\in (0,1)\) using a certain product formula. But the product formula will blow up, contradicting our assumption that there are no poles.

We will need two lemmas.

Theorem. [Landau] Let \(f(s) = \sum_{n\ge 1} a_n / n^{s}\) with real coefficients \(a_n\ge 0\). Suppose that this series converges for \(\mathrm{Re}(s)>1\). Suppose that \(f\) extends to an analytic function on \(\mathrm{Re}(s)>0\) with no poles. Then the series defining \(f(s)\) converges for \(\mathrm{Re}(s)>0\).

We will prove this theorem in the TLDR version of the proof far below.

Let \(ord(p)\) denote the order of \(p\)’s image in the group \(G\).

Lemma. If \(\mathrm{Re}(s)>1\) then \[ \zeta_m(s) = \prod_{p \perp m} \left(\frac{1}{1-p^{-ord(p)s}}\right)^{\frac{\phi(m)}{ord(p)}}. \]

Proof. Note that \[ 1-x^{ord(p)} = \prod_{\omega \in U_{ord(p)}}(1-\omega x). \] Thus, \[ \prod_{\chi}(1-\chi(p)x) = \left(\prod_{\omega\in U_{ord(p)}}(1-\omega x)\right)^{\frac{\phi(m)}{ord(p)}} = (1-x^{ord(p)})^{\frac{\phi(m)}{ord(p)}}.\]

Remark. Now magically, \[ \frac{1}{1-p^{-ord(p)s}} \] is a Dirichlet series with positive real coefficients, and so thus \(\zeta_m(s)\) is as well. So we can use Landau’s Theorem.

Theorem. For any non-trivial character \(\chi\) we have \(L(1, \chi)\neq 0\).

Proof. Assume for sake of contradiction that \(L(1, \chi) = 0\) for some non-trivial \(\chi\). Then, \(\zeta_m(s)\) extends analytically to \(\mathrm{Re}(s)>0\) with no poles. Then, by Landau’s theorem the series that defined \(\zeta_m(s)\) on \(\mathrm{Re}(s)>1\) converges for \(\mathrm{Re}(s)\in (0,1)\). However, we can show that \[ \zeta_m(s) \ge \sum_{n\perp m} \frac{1}{n} = \infty. \] But this contradicts our assumption that \(\zeta_m(s)\) has no poles.

Apostol’s Intro to ANYNT seems to contain a pretty reasonable proof. Probably possible to synthesize with what I have so far.

TLDR:

Let \(\hat{G}\) be the dual of \(G=(\mathbb{Z}/m\mathbb{Z})^{*}\). For \(\chi\in \hat{G}\), and any complex \(s\) where it makes sense (this will basically be \(\mathrm{Re}(s)>0\), except for the trivial character we need to be a bit more careful), define \(L(s,\chi) = \sum_{n\ge 1} \chi(n)/n^{s}\).

  1. \(L(s, \chi)\) is analytic on \(\mathrm{Re}(s)>0\) for \(\chi\neq \chi_0\). The convergence is clear. I’ll explain the analyticness below.
  2. \(L(s,\chi_0)-\frac{1}{s-1}\) extends analytically to a function on \(\mathrm{Re}(s>0)\) with a single pole, namely at \(s=1\). The convergence is not too hard to show and basically implies the analyticness.
  3. The product \(\zeta_m(s) = \prod_\chi L(s, \chi)\) turns out to be a Dirichlet series with real positive coefficients at least on \(\mathrm{Re}(s)>1\).
  4. Landau’s Theorem states that because \(\zeta_m(s)\) is analytic and is represented by a Dirichlet series with real positive coefficients, then IF \(\zeta_m(s)\) had no poles on \(\mathrm{Re}(s)>0\), then we would get that the product formula defining \(\zeta_m(s)\) actually is also valid for all \(\mathrm{Re}(s)>0\), but this is absurd, because the product formula blows up for small enough \(s\in (0,1)\).
  5. \(\left| \ln L(s,\chi) - \sum_p \frac{\chi(p)}{p^{s}} \right| \le 1618\) for all \(\mathrm{Re}(s)>1\). This is basically just by Taylor Series.
  6. Summing the expression from (5) gives \(\sum_{p\equiv a \mod m} p^{-s} \ge \frac{1}{\phi(m)} \left|\sum_\chi \overline{\chi(a)}\ln L(s, \chi)\right| - 100m.\) This is by the orthogonality relations.
  7. Observe that in (6) we know that as \(s\to 1\), \(L(s, \chi_0)\to \infty\). So, if we can assert that the other \(L(s,\chi)\) guys are not unboundedly pointing in the wrong direction then we are good. We do this with the non-vanishing stuff that I’ve already described above.

Lemma. \(e^{z}\) is complex differentiable.

Proof. \[\lim_{z\to 0} \frac{e^{z}-e^{0}}{z} = 1.\]

Lemma. If \(\sum_{n\ge 1} a_n/n^{s}\) converges for all \(\mathrm{Re}(s)>\sigma_0\) then it is analytic on this domain.

Proof. Convergence means we can basically consider a truncated version of the sum; finite sums play nice with complex differentiability.

So Landau’s theorem is the main technical lemma I have yet to understand.

Let me state Landau’s theorem again and present the proof from Garrett:

Theorem. Suppose that \(f(s) = \sum_{n\ge 1} \frac{a_n}{n^{s}}\) is a Dirichlet Series with all \(a_n\) being real and non-negative, where the series converges for all \(\mathrm{Re}(s)>1\). Suppose that \(f(s)\) extends analytically to all of \(\mathrm{Re}(s)>0\) and that \(f\) has no poles in \(\mathrm{Re}(s)>0\), i.e., there is a holomorphic extension of \(f\) on all of \(\mathrm{Re}(s)>0\). Then the series defining \(f\) on \(\mathrm{Re}(s)>1\) actually converges for all \(\mathrm{Re}(s)>0\).

Proof. We take the derivatives of \(f\) at \(s=1\); this is well defined because we are assuming \(f\) is holomorphic on \(\mathrm{Re}(s)>0\). We get that the \(k\)-th derivative, taken at \(s=1\) is \[ f^{(k)}(1) = (-1)^{i} \sum_n \frac{(\log n)^{k}a_n}{n}. \]

So the Taylor series for \(f\) is \[ f(s) = \sum_{k\ge 0} \frac{f^{(k)}(1)}{k!}(s-1)^{k}, \] which holds for \(\mathrm{Re}(s)>0\).

Now we have \[ f(s) = \sum_{k} \frac{(1-s)^{k}}{k!}\sum_n \frac{(\log n)^{k}}{n}a_n .\] We are guaranteed that this sum converges, and by assumption all terms in the sum are positive. Hence we are free to arbitrarily re-arrange the terms in the sum.

Doing so we find: \[ f(s) = \sum_n \frac{a_n}{n} \sum_k \frac{(\log n)^{k}(1-s)^{k}}{k!} = \sum_n \frac{a_n}{n^{s}}.\] Thus, this final sum on the RHS, which is exactly the sum we cared about, must converge.