Unpacking the n-1 in the t test

If you ask an AI (below Grok 3 think mode) to explain the degree of freedom in a t test, or open an introductory statistics textbook you'll get explanation like this

https://grok.com/share/bGVnYWN5_e1fcd789-991d-4593-beba-99a78a8f71c1

This explainer is OK for someone who gets introduced to the topic for the first time - this had been how I taught the topic for a first year statistics class. But there are a lot of hand-waving which may not be deemed satisfactory for someone who likes precise statement.

My goal in this post is to spell out all the details.

Fun fact

I'll start with a fun fact that may seem off topic. I promise it is not.

Let \(X,Y\) be independent standard Gaussian random variables. Consider the mean \((X+Y)/2\) and the distance to the mean \(X - (X+Y)/2 = (X-Y)/2\). THe interesting fact is that they are independent Gaussians (with variance \(1/2\)).

In other word knowing the mean does not provide any information about the distance from the mean to a particular sample. The fact can be proved by showing that the covariance is zero, which implies independence under Gaussian distribution (compute the Laplace transform to prove the latter).

Ok, now let's push the example a bit by considering \(n\) iid Gaussian \(X_1,...,X_n\) with mean \(\mu\) and variance \(\sigma^2\). Analogously, consider

\[ G = (X_i - \bar X)_{i= 1,...,n} \]

where \(\bar X = \frac{1}{n} \sum_{i=1}^n X_i\). Note that this is a Gaussian vector. It is a degenerate one because this random vector lives in a hyperplane \(\{x\in\mathbb{R}^n: \sum_i^n x_i = 0 \}\). Again we can show that each coordinate of this Gaussian vector is independent of \(\bar X\) by checking the covariance because of Gaussianity. Therefore \(G\) and \(\bar X\) are independent.

Recall that the sample variance is the scaled squared 2-norm of the vector \(G\)

\[ \hat\sigma^2 = \frac{1}{n-1}\|G\|^2 \]

where the normalization \(n-1\) ensures that \(\hat\sigma^2\) is an unbiased estimator of \(\sigma^2\).

As a direct consequence of the independence of \(G\) and \(\bar X\), we get independence of \(\hat\sigma^2\) and \(\bar X\).

Relate to t statistic

t statistic for iid Gaussians of hypothesised population mean \(\mu\) and unknown variance \(\sigma^2\) is given by

\[ t = \frac{\bar X - \mu}{\hat\sigma/\sqrt{n}} \]

From our discussion in the previous section, we know that the numerator and the denominator are independent random variables. Since \(\bar X\) is Gaussian, we know that t statistic is actually a mixture of Gaussian with random variance \(\sigma^2/\hat\sigma^2\).

We know Gaussian very well but what is the distriubtion of the sample variance?

Setting up the target

The goal is to show

\[ \hat\sigma^2 = \sigma^2 \frac{\chi^2_{n-1}}{n-1} \]

where \(\chi^2_d\) is the chi-square distribution with \(d\) degrees of freedom.

Ah, degree of freedom ! Hold on, we are close.

Recall that \(\chi^2_d\) can be defined as the sum of independent standard Gaussian squared where the degree of freedom is the number of terms in the sum.

Looking at \(\hat\sigma^2\), we do have a sum of Gaussian squared, but they are not indepedent (\(G\) is degenerate) and there are \(n\) terms in the sum. We need to find a way to somehow reformualte it as a sum of \(n-1\) independent Gaussian squares.

Some linear algebra

Notice that the distribution of \(G\) is invariant with respect to the value of \(\mu\). We therefore can assume \(\mu=0\). Our goal is to show

\[ \frac{\|G\|^2}{\sigma^2} = \chi^2_{n-1} \]

We can write \(G\) in matrix form

\[ G = A X \]

where \(A = I - (1/n)E\) with \(E\) being \(n \times n\) filled with 1 and \(I\) being the identity matrix.

THe matrix \(A\) has a very interesting property

\[ A^2 = A \]

which can be checked easily. This property (aka idempotent) implies that all the eigenvalues are either 0 or 1. Indeed, if \(Av = \lambda v\) for some non-zero vector \(v\), then \(\lambda v = Av = A^2v = A(Av) = \lambda^2 v\).

By eigenvalue decomposition, there exists orthonormal matrix \(O\) and diagomal matrix \(\Lambda\) (filled with 0 and 1 on the diagonal because of idempotence) such that \(A= O^T\Lambda O\), leading to

\[ \|G\|^2 = \langle AX, AX \rangle = \langle AX,X\rangle = \langle \Lambda OX, OX\rangle = \langle \Lambda X, X\rangle \]

where the last equality is distributional identity, using the fact that standard multivariate normal distribution is invariant under orthonormal transformation.

Therefore we have shown that \(\|G\|^2\) can indeed be represented as the sum of independent mean-zero (variance \(\sigma^2\)) Gaussian sqaured. The one last thing to check is the number of terms in the sum which corresponds to the number of ones in the spectrum of \(A\). This must be equal to the rank of \(A\) which is \(n-1\). CQFD.

Putting all together

To conclude, we have

\[ t = \frac{N(0,1)}{\sqrt{\chi^2_{n-1} / (n-1)}} \]

with independent numerator and denominator. We can now subscript \(t\) with parameter \(n-1\) to indicate the degree of freedom on the right hand side of the equation.

A slight generalisation

Consider two sample t test. Given \(\{X_1,...,X_n\}\) and \(\{Y_1,...,Y_m\}\) two collections of Gaussian with same variance \(\sigma^2\).

The null hypothesis is that the population mean of two samples are the same. The t statistic is

\[ t = \frac{\bar X - \bar Y}{ (1/n + 1/m) \hat\sigma_p } \]

where \(\hat\sigma_p^2 = \frac{(n-1)\hat\sigma^2_X + (m-1)\hat\sigma^2_Y}{n+m-2}\), the pooled sample variance. The numerator is the squared norm of a Gaussian vector

\[(X_1 - \bar X, .... X_n-\bar X, Y_1 - \bar Y, ..., Y_m - \bar Y)\]

which lives in a \((n+m-2)\)-dimensional subspace. One checkes readily that this vector can be represented as an idempotent transformation of the (\(m+n\))-variate standard Gaussian \((X_1,...,X_n, Y_1,...,Y_m)\). Arguing as before, the squared norm can be written, via spectral decomposition, as a sum of independent squared Gaussians (i.e. chisquare). The number of terms in the sum coincide with the rank of the idempotent transformation which is \(m+n-2\). We have thus \(t = \frac{N(0,1)}{ \sqrt{\chi^2_{m+n-2} / (m+n-2)}}\).