Thursday 13 September 2012

Constructible regular polygons


The proof presented below, giving an (almost) complete characterisation of constructible regular polygons, is such a beautiful gem of a proof that I can't help but record it here so that I might not forget it. I'll go over far more details than is usually done, simply because it amuses me how many different areas of undergraduate mathematics are touched upon.
Theorem: The regular $n$-gon is constructible by ruler and compass if and only if $n$ has the form $2^k p_1 \cdots p_l$, where $p_1,\ldots,p_l$ are distinct primes of the form $2^{2^m} + 1$.
Primes of this form are called Fermat primes. The only known Fermat primes are $3,5,17,257,65537$, and heuristics suggest that there may be no others, though this is an open problem.

We must say what we mean by "constructible". We assume we are given a plane (a sheet of paper, say), with two points already labelled. We parameterise the plane by points of $\mathbf{C}$, and we rotate and scale so that the two labelled points are $0$ and $1$. There are three things we are allowed to do:
  1. Whenever we have two labelled points we can draw the straight line through them.
  2. Whenever we have two labelled points we can draw the circle with one as centre and the other on the circumference.
  3. We can label any point of intersection (of two lines, two circles, or a line and a circle).
Any point of the plane (considered as an element of $\mathbf{C}$) that can be labelled through a sequence of the above operations is called constructible. Denote by $\mathbf{E} \subset \mathbf{C}$ the set of constructible numbers.
Lemma: $\mathbf{E}$ is the smallest subfield of $\mathbf{C}$ closed under taking square roots.
The lemma can be proved as follows.
  1. Show closure under $(x,y)\mapsto x+y$ by constructing the parallelogram with vertices $0,x,y,x+y$.
  2. Show closure under the rotation $x\mapsto ix$.
  3. Show closure under taking real parts. Hence $\mathbf{E} = \mathcal{R}\mathbf{E} + i \mathcal{R}\mathbf{E}$, where $\mathcal{R}\mathbf{E} = \mathbf{E}\cap\mathbf{R}$.
  4. Show closure under $(x,y)\mapsto xy$ for $x,y\in\mathcal{R}\mathbf{E}$ by constructing the right triangle with leg lengths $1,x$ and the similar right triangle with base $y$. Deduce that $\mathbf{E}$ is a ring.
  5. Show that if $0\neq x\in\mathbf{E}$ then $1/x\in\mathbf{E}$ by doing so first for $x\in\mathcal{R}\mathbf{E}$ (another right triangle construction) and then using $1/x = \bar{x}/|x|^2$. Thus $\mathbf{E}$ is a field.
  6. Show that $\mathcal{R}\mathbf{E}$ is closed under taking square roots (yet another right triangle construction). Thus show that $\mathbf{E}$ is closed under taking square roots by bisecting an appropriate angle.
  7. Finally show that any subfield $K$ of $\mathbf{C}$ closed under taking square roots is closed under the "constructing" rules we listed when defining $\mathbf{E}$. The idea here is that one never has to solve an equation worse than a quadratic (when intersecting two circles, first construct the perpendicular bisector to the segment connecting the two centres, and then compute the intersection of this line with one of the circles).
The following is our algebraic characterisation of constructibility.
Lemma: $x\in\mathbf{E}$ if and only if the degree $[K:\mathbf{Q}]$ of the normal closure $K$ of $\mathbf{Q}(x)$ is a power of $2$.
Proof: By the previous lemma, $\mathbf{E}$ can be described as the union of all fields $\mathbf{Q}(\alpha_1,\ldots,\alpha_m)$ such that $\alpha_k^2\in\mathbf{Q}(\alpha_1,\ldots,\alpha_{k-1})$ for all $k=1,\ldots,m$. Let $x\in\mathbf{E}$. Thus $x$ lies in some such field $L = \mathbf{Q}(\alpha_1,\ldots,\alpha_m)$. But note by the tower law that
$$[L:\mathbf{Q}] = \prod_{k=1}^{m} [\mathbf{Q}(\alpha_1,\ldots,\alpha_k):\mathbf{Q}(\alpha_1,\ldots,\alpha_{k-1})],$$where each factor $[\mathbf{Q}(\alpha_1,\ldots,\alpha_k):\mathbf{Q}(\alpha_1,\ldots,\alpha_{k-1})]\leq 2$. Thus $[L:\mathbf{Q}]$ is a power $2$, and by another application of the tower law so is $[K:\mathbf{Q}]$.

Conversely suppose that $[K:\mathbf{Q}]$ is a power of $2$. Then the Galois group $\text{Gal}(K/\mathbf{Q})$ is a $2$-group. Recall that any $p$-group $G$ (except the trivial group) has nontrivial centre: by counting conjugacy class sizes, the size of $Z(G)$ must be divisible by $p$. Thus by induction (and classifying abelian $p$-groups) it follows that in every $p$-group $G$ there is a sequence of normal subgroups $G_i \triangleleft G$ such that $G_0 = \{e\}$, $G_n = G$, and $|G_k/G_{k-1}| = p$ for each $k=1,\ldots,n$. Specialising to $p=2$, Galois theory now implies that $K$ is the top of a tower of quadratic extensions starting from $\mathbf{Q}$, so $x\in K\subset\mathbf{E}$.

By saying "the $n$-gon is constructible" we mean that the vertices of some regular $n$-gon are constructible. Clearly this is equivalent to saying $\zeta_n = \exp(i2\pi/n)$ is an element of $\mathbf{E}$. Note that $\Phi_n(\zeta_n) = 0$, where the $n$th cyclotomic polynomial $\Phi_n$ is defined by $$\Phi_n(X) = \prod_{(k,n)=1} (X-\zeta_n^k).$$Note that $\prod_{d|n} \Phi_d(X) = X^n-1 \in \mathbf{Z}[X]$, so by the uniqueness of polynomial division and induction we have that $\Phi_n(X) \in \mathbf{Z}[X]$.
Lemma: $\Phi_n$ is irreducible over $\mathbf{Q}$.
Proof: Suppose $\Phi_n(X) = f(X) g(X)$ where $f$ is irreducible and $\deg f\geq 1$. We must show that each $\zeta_n^k$ is a root of $f$. Since some $\zeta_n^k$ is a root of $f$, it suffices to show that $f(z)=0$ implies $f(z^p)=0$ whenever $p$ is a prime not dividing $n$. Towards this end, suppose that $p$ is such a prime and $f(z)=0$ but $f(z^p)\neq 0$. Then $g(z^p) = 0$ since $\Phi_n(z^p)=0$. Since $f$ is irreducible this implies that $f(X)$ divides $g(X^p)$. Denoting reductions modulo $p$ by a tilde and using Freshman's dream, $\tilde{f}$ divides $\tilde{g}(X^p) = \tilde{g}(X)^p$. Thus $\tilde{f}$ and $\tilde{g}$ are not coprime, so $\tilde{\Phi_n}$, and thus $X^n-1$, has a multiple root over $\mathbf{F}_p$. But this is a contradiction since $X^n-1$ and its formal derivative $nX^{n-1}$ are coprime.

Thus $\zeta_n$ has minimal polynomial $\Phi_n$, and moreover $\mathbf{Q}(\zeta_n)/\mathbf{Q}$ is a normal extension. We have therefore reduced our task to finding those $n$ for which $[\mathbf{Q}(\zeta_n):\mathbf{Q}] = \deg\Phi_n = \varphi(n)$ is a power of $2$.

Writing $n$ as a product of primes $n=p_1^{e_1}\cdots p_m^{e_m}$ with each $e_i>0$, we have (OK, the details must stop somewhere) $$\varphi(n) = p_1^{e_1 - 1}(p_1 - 1) \cdots p_m^{e_m-1} (p_m-1).$$This is a power of $2$ iff every $p_i\neq 2$ which occurs has $e_i = 1$ and $p_i - 1$ a power of $2$. But if $p = 2^k + 1$ is prime, then $k$ itself must be a power of $2$, since for every odd $q$ we have $$x^q + 1 = (x + 1)(x^{q-1} - x^{q-2} + - \cdots + 1).$$


Tuesday 27 March 2012

Generating direct powers

Before continuing, prove or disprove: If $G$ is a group, the direct power $G^{\times n}$ is never generated by fewer than $n$ elements.

Certainly this is the case if $G$ is abelian, as in this case $G^{\times n}$ has a quotient of the form $\mathbf{F}_p^n$, i.e., an $n$-dimensional vector space over $\mathbf{F}_p$. This is also the case if $n\leq 2$. Examples of  small size are therefore a little hard to find. 

Here is a nice example: Denote the $m$th prime by $p_m$, and let $G = A_{p_m}$. Then I claim that $G^{\times(m-1)}$ can be generated with $3$ elements. Indeed, take $\alpha,\beta\in G^{\times(m-1)}$ to be in each factor equal to some two generators $\tilde{\alpha},\tilde{\beta}$ of $G$, and take $\gamma$ to be an element of the form  (3-cycle, 5-cycle, 7-cycle, ... , $p_m$-cycle). Then the $(p_2\cdots p_j p_{j+2}\cdots p_m)$th power of $\gamma$ is a $p_{j+1}$-cycle in the $j$th factor. With $\alpha$ and $\beta$ we can generate all the conjugates of this cycle, and these together generate the whole $j$th factor by simplicity.

Saturday 24 March 2012

The probability of coprimality

A famous result: Two positive integers chosen at random will be coprime with probability $6/\pi^2$.

One has to say what one means by this, since there is no uniform distribution on the positive integers. What we mean is that if we choose $x$ and $y$ uniformly and independently at random from the integers $\{1,\ldots,N\}$, then $x$ and $y$ will be coprime with probability tending to $6/\pi^2$ as $N\to\infty$.

Here is one classic proof: Consider the points $(x,y)\in\{1,\ldots,N\}^2$ such that $\gcd(x,y)=1$. If we scale this picture by $M$, we will have exactly the points $(X,Y)\in\{1,\ldots,MN\}^2$ such that $\gcd(X,Y)=M$. Therefore, whatever the limiting probability $P$ of being coprime is, the probability of having $\gcd = M$ is exactly $P/M^2$. The law of total probability then implies
\[ 1 = \lim_{N\to\infty}\mathbf{P}(\gcd(x,y)\in\mathbf{N}) = P \sum_{M=1} \frac{1}{M^2} = P\,\zeta(2).\]

Another way of imagining a random element on a noncompact group $G$ is to look at the profinite completion $\hat{G}$, which is always compact so always possesses a uniform distribution. Now by the Chinese Remainder Theorem,
\[ \hat{\mathbf{Z}} = \prod_p \mathbf{Z}_p,\]
so choosing an element at random from $\hat{\mathbf{Z}}$ is the same as choosing an element at random from $\mathbf{Z}_p$ for each $p$. The probability that a random element of $\mathbf{Z}_p$ is divisible by $p$ is precisely $1/p$, so the probably that two elements from $\hat{\mathbf{Z}}$ are coprime is precisely
\[ \prod_p (1-1/p^2) = 1/\zeta(2).\]
This does not imply the asymtotic result in $\mathbf{Z}$ mentioned above, but it is a nice way of thinking about it.


EDIT (28/4/12): As pointed out to me by Freddie Manners, the "classic proof" suggested above suffers from the fatal defect of not proving that the limit exists. Here is an alternative proof which does prove that the limit exists. More generally, let $d>1$ and consider positive integers $x_1,\ldots,x_d\leq X$ chosen uniformly at random. Let $G(X)$ be the number of $(x_1,\ldots,x_d)\in[1,X]^d$ such that $\gcd(x_1,\ldots,x_d)=1$, and observe that the number of $(x_1,\ldots,x_d)\in[1,X]^d$ such that $\gcd(x_1,\ldots,x_d) = n$ is precisely $G(X/n)$. Then
\[ \lfloor X\rfloor^d = \sum_{n\geq1} G(X/n) \]
for all $X$. Inverting this relationship,
\[ G(X) = \sum_{n\geq1} \mu(n) \lfloor X/n\rfloor^d. \]
Thus, the probability that $d$ randomly and uniformly chosen integers $x_1,\ldots,x_d\leq X$ are all coprime is exactly
\[P(X) = \sum_{n\geq1} \mu(n) \left(\frac{\lfloor X/n\rfloor}{\lfloor X\rfloor}\right)^d. \]
The $n$th term here is bounded by $n^{-d}$, so by the dominated convergence theorem $P(X)\to\sum_{n\geq1} \mu(n) n^{-d} = \zeta(d)^{-1}$.

Tuesday 13 March 2012

Volume of the $n$-ball

The following is a simple and beautiful proof, shown to me to my great delight while I was in high school, that the $n$-ball of radius $r$ has $n$-volume
\[\frac{\pi^{n/2} r^n}{\Gamma(\frac{n}{2}+1)}\]
Although I expect most people will know it, I believe that everybody should see it. I don't know the history of it, and would be interested in learning.

Suppose that the $n$-ball $B^n$ of radius $1$ has $n$-volume $V_n(1)$. Then, by considering linear transformations, the $n$-ball $rB^n$ of radius $r$ has $n$-volume $V_n(r) = V_n(1) r^n$. Moreover, differentiating this with respect to $r$ should produce the surface $(n-1)$-volume of the $(n-1)$-sphere $S^{n-1} = \partial B^n$: thus we expect $\text{vol}^{(n-1)}(S^{n-1}) = V_n(1) n r^{n-1}$.

Now consider the integral
\[I = \int_{\mathbf{R}^n} e^{-\pi|x|^2}\,dx.\]
We will compute $I$ in two different ways. On the one hand,
\[I = \int_{-\infty}^{+\infty}\cdots\int_{-\infty}^{+\infty} e^{-\pi x_1^2 -\cdots - \pi x_n^2}\,dx_1\cdots dx_n = \left(\int_{-\infty}^{+\infty} e^{-\pi x^2}\,dx\right)^n = 1\]
On the other hand, the form $I$ suggests introducing a radial coordinate $r=|x|$. Computing this way,
\[I = \int_0^\infty e^{-\pi r^2} (V_n(1) n r^{n-1})\,dr = \frac{n V_n(1)}{\pi^{n/2}} \int_0^\infty e^{-t^2} t^{n-1}\, dt\qquad\quad\]
\[\qquad\quad= \frac{n V_n(1)}{2 \pi^{n/2}} \int_0^\infty e^{-s} s^{n/2-1}\,ds = \frac{n V_n(1)}{2 \pi^{n/2}} \Gamma(n/2) = \frac{V_n(1)\Gamma(n/2+1)}{\pi^{n/2}}.\]

Thursday 19 January 2012

Rationals are repeating $p$-adics

I've been amusing myself with $p$-adic arithmetic lately: I've never really got to know it until now.

We are all familiar with the fact that every $q\in\mathbf{Q}$ gets represented in $\mathbf{R}$ as a repeating decimal, or whatever your favourite base is. (Here I count terminating decimals as decimals which eventually repeat $000...$.) The converse is of course true as well: repeating decimals are rationals.

Is this true in $\mathbf{Q}_p$ as well? That is, are the rationals precisely those $p$-adics which are eventually repeating (in the other direction, of course)? One direction, that repeating $p$-adics are rational, is pretty obvious: if $a\in p^{-t}\mathbf{Z}$ and $b\in\mathbf{Z}$ then
\[
a + b p^r + b p^{r+s} + b p^{r+2s} + \cdots = a + b \frac{p^r}{1 - p^s}
\]
is rational. What about the converse?

The converse seems trickier. How again did we do it in $\mathbf{R}$? I don't even remember: it's one of those things that we know so fundamentally (until very recently, $\mathbf{Q}$ was almost defined in my brain as the reals which eventually repeat) that we forget how to prove it.

Who cares how to prove it? It is true, and it says, in base $p$, that if $q\in\mathbf{Q}$ then there exists $a\in p^{-t}\mathbf{Z}$ and $b\in\mathbf{Z}$ such that
\[
q = a + b p^r + b p^{r-s} + b p^{r-2s} + \cdots = a + b \frac{p^r}{1 - p^{-s}}.
\]
But hey, this implies that
\[
q = a - b \frac{p^{r+s}}{1 - p^s},
\]
which we already know is a repeating $p$-adic.

Inducing a Haar measure from a quotient

Suppose that $G$ is a locally compact group and that $H$ is a closed subgroup with an $H$-left-invariant regular Borel measure $\mu_H$ such that $G/H$ possesses a $G$-left-invariant regular Borel measure $\mu_{G/H}$. For instance, $G = \mathbf{R}$, $H=\mathbf{Z}$, and $G/H= S^1$. The following is how you then induce a Haar measure on $G$. (Technically, it's easier to construct Haar measure on compact groups, so this extends that construction slightly.)

For $f\in C_c(G)$, define $T_H(f) : G\to \mathbf{C}$ by
\[
T_H(f)(x) = \int_H f(xh) d\mu_H.
\]
Let $K=\text{supp}f$. Then $f(xh)$, as a function of $h$ is supported on $x^{-1} K$, so the above integral is finite. Moreover, if $x,y\in U$ and $U$ is compact, then
\[
|T_H(f)(x)  - T_H(f)(y)| = \left| \int_H (f(xh)-f(yh))d\mu_H\right| \leq \mu_H(H\cap U^{-1} K) \sup_h |f(xh)-f(yh)|.
\]
Because a continuous function on a compact set is uniformly continuous (in the sense that there exists a neighbourhood $V$ of $e$ such that $gh^{-1} \in V$ implies $|f(g)-f(h)| <\epsilon$), $T_H(f)$ is continous. Since $T_H(f)$ is $H$-right-invariant, $T_H(f)$ descends to a continuous function $\hat{T}_H(f)$ defined on $G/H$. Moreover, if $q$ is the quotient map $G\to G/H$, then $\hat{T}_H(f)$ is supported on $q(K)$, so $\hat{T}_H:C_c(G)\to C_c(G/H)$. Finally, define $\lambda: C_c(G)\to \mathbf{C}$ by
\[
\lambda(f) = \int_{G/H} \hat{T}_H(f) d\mu_{G/H}.
\]
This linear functional $\lambda$ is positive (in the sense that $f\geq0$ implies $\lambda(f)\geq0$), so the Riesz representation theorem guarantees the existence of a regular Borel measure $\mu_G$ on $G$ such that
\[
\lambda(f) = \int_G f d\mu_G
\]
for all $f\in C_c(G)$. It is now a simple matter to check that $\mu_G$ is $G$-left-invariant.

Tuesday 3 January 2012

Deriving the normal distribution


When I was in high school, I asked several a person, "Why the normal distribution?" After all, the function $e^{-x^2/2}/\sqrt{2\pi}$ looks like a pretty bizarre function to guess, when there are other functions like $1/(1+x^2)$ which produce perfectly fine bell-shaped curves. One answer I received was that the normal distribution is in some sense the limit of the binomial distribution. While this answer seems fair enough, I tried my hand at the mathematics and did not succeed, so I was still confounded. None the less I believed the answer, and it satisfied me for the time.

The real answer that I was looking for but did not appreciate until university was the central limit theorem. For me, the central limit theorem is the explanation of the normal distribution. In any case, the calculation that I attempted was basically a verification of the central limit theorem in a simple case, and it is a testement to the force of the central limit theorem that that simple case is difficult to work out by hand.

In this post, I rectify that calculation that I should have accomplished in high school (with the benefit of hind-sight being the correct factor of $\sqrt{n}$ rather than $n$). On the way, I will also check both laws of large numbers in this simple case.

Consider a "random walk" on $\bf Z$. Specifically, let $X$ be a random variable taking the values $1$ and $-1$ each with probability $1/2$, and let
\[
S_n = X_1+X_2+\ldots+X_n,
\]
where each $X_i$ is an independent copy of $X$. Note that $(X+1)/2$ is Bernoulli, so $S_n/2 + n/2$ is binomial, so
\[
{\bf P}(S_n = k) = {\bf P}(S_n/2 + n/2 = k/2 + n/2) = \left(\begin{array}{c}n\\ k/2+n/2\end{array}\right) 2^{-n},
\]
where we make the convention that the binomial coefficient is zero when it doesn't make sense. Hence, if $x>0$,
\[
{\bf P}(S_n/n \geq x) = \sum_{k\geq x} {\bf P}(S_n/n = k) \leq \frac{n}{2} \left(\begin{array}{c} n\\\lceil (1+x)n/2\rceil \end{array}\right) 2^{-n}.
\]
By Stirling's formula, $m! \sim \sqrt{2\pi m} (m/e)^m$, so
\[
\left(\begin{array}{c}m\\k\end{array}\right) \sim \sqrt{\frac{m}{2\pi k(m-k)}} \frac{m^m}{k^k (m-k)^{m-k}}
\]
as $m,k\rightarrow\infty$. Hence, for constant $x>0$,
\[
{\bf P}(S_n/n \geq x) \leq \frac{n}{2} \left(\begin{array}{c} n\\\lceil (1+x)n/2\rceil \end{array}\right) 2^{-n} \sim \frac{\sqrt{n}}{\sqrt{2\pi(1-x^2)}} \left((1+x)^{1+x} (1-x)^{1-x}\right)^{-n/2}.

\]

Let $f(x) = (1+x)^{1+x} (1-x)^{1-x}$. Note that $f(x)>0$ for $0<x<1$, that $f(0)=1$, and that
\[
\frac{f^\prime(x)}{f(x)} = (\log f(x))^\prime = \log(1+x) + 1 - \log(1-x) - 1 = \log\left(\frac{1+x}{1-x}\right) > 0,
\]
so $f(x)>1$ for all $0<x<1$. Hence
\[
{\bf P}(S_n/n\geq x) \rightarrow 0,
\]
and by symmetry,
\[
{\bf P}(S_n/n\leq -x) \rightarrow 0,
\]
as $n\rightarrow\infty$. Thus we have verified the weak law.

In fact, because the convergence above is geometric,
\[
\sum_{n=1}^\infty {\bf P}(S_n/n\geq x) < \infty,
\]
whence
\[
{\bf P}(S_n/n\geq x\text{ infinitely often}) = \lim_{m\rightarrow\infty} {\bf P}(S_n/n\geq x\text{ for some }n\geq m)  \leq \lim_{m\rightarrow\infty} \sum_{n=m}^\infty {\bf P}(S_n/n\geq x) =0.
\]
(This is the Borel-Cantelli lemma.) Thus $S_n/n\rightarrow 0$ almost surely. Thus we have verified the strong law.

It remains to check the central limit theorem, i.e., to investigate the limiting distribution of $S_n/\sqrt{n}$. Now, for constant $x<y$,
\[
{\bf P}(x\leq S_n/\sqrt{n} \leq y) = \int_x^y\frac{\sqrt{n}}{2} \left(\begin{array}{c} n\\ (1+t/\sqrt{n})n/2\end{array}\right) 2^{-n} \,d\mu_n(t),
\]
where $\mu_n$ is the atomic measure assigning a mass $2/\sqrt{n}$ to each point of $(n+2{\bf Z})/\sqrt{n}$. Now Stirling's formula strikes again, giving
\[
\frac{\sqrt{n}}{2}\left(\begin{array}{c} n\\ (1+t/\sqrt{n})n/2\end{array}\right) 2^{-n} \sim \frac{1}{\sqrt{2\pi(1-t^2/n)}} \left((1+t/\sqrt{n})^{1+ t/\sqrt{n}} (1- t/\sqrt{n})^{1- t/\sqrt{n}}\right)^{-n/2} \longrightarrow \frac{1}{\sqrt{2\pi}} e^{-t^2/2},
\]
as
\[
\lim_{m\rightarrow\infty} \left((1+t/m)^{1+t/m} (1-t/m)^{1-t/m}\right)^{m^2/t^2} = \lim_{z\rightarrow\pm\infty} (1-1/z^2)^{z^2} (1+1/z)^z (1-1/z)^{-z} = e.
\]
Hence
\[
{\bf P}(x\leq S_n/\sqrt{n} \leq y) = \frac{1}{\sqrt{2\pi}} \int_x^y e^{-t^2/2}\,d\mu_n(t) + \int_x^y R_n(t)\,d\mu_n(t),
\]
where $R_n(t)\rightarrow 0$ as $n\rightarrow\infty$. Moreover this convergence is uniform in $t\in[x,y]$, so
\[
\left|\int_x^y R_n(t)\,d\mu_n(t)\right| \leq \mu_n([x,y]) \max_{x\leq t\leq y} |R_n(t)| \leq (y-x + 2/\sqrt{n})\max_{x\leq t\leq y} |R_n(t)| \rightarrow 0.
\]
Hence
\[
{\bf P}(x\leq S_n/\sqrt{n}\leq y) \rightarrow \frac{1}{\sqrt{2\pi}} \lim_{n\rightarrow\infty}\int_x^y e^{-t^2/2}\,d\mu_n(t) = \frac{1}{\sqrt{2\pi}}\int_x^y e^{-t^2/2}\,dt,
\]
where the last equality follows from the theorem that continuous functions are Riemann integrable. Thus we have verified the central limit theorem.