## Variance and the sum of squared pairwise differences

The variance $\sigma^2$ of a set of $n$ values $x_1, x_2, ..., x_n$ is usually expressed in terms of squared differences between those values and the mean $\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i$ of those values. $\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2$

However the sum of squared differences $(x_i - \bar{x})^2$ between the values and the mean can also be expressed in term of the sum of squared pairwise differences $(x_i - x_j)^2$ among the values themselves, without reference to the mean $\bar{x}$.

In particular, we want to show that $\sum_{i=1}^{n} (x_i - \bar{x})^2 = \frac{1}{2n} \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - x_j)^2$.

To get an expression involving $\bar{x}$ we rewrite the squared difference in the righthand sum and then expand the result: $\sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - x_j)^2 = \sum_{i=1}^{n} \sum_{j=1}^{n} [(x_i - \bar{x}) - (x_j - \bar{x})]^2$ $= \sum_{i=1}^{n} \sum_{j=1}^{n} [(x_i - \bar{x})^2 - 2 (x_i - \bar{x}) (x_j - \bar{x}) + (x_j - \bar{x})^2]$ $= \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - \bar{x})^2 - 2 \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - \bar{x}) (x_j - \bar{x}) + \sum_{i=1}^{n} \sum_{j=1}^{n} (x_j - \bar{x})^2$

Since the squared difference in the first term does not depend on $j$, the first term can be rewritten as $\sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - \bar{x})^2 = \sum_{i=1}^{n} n (x_i - \bar{x})^2 = n \sum_{i=1}^{n} (x_i - \bar{x})^2$

Since the squared difference in the third term does not depend on $i$, the third term can be rewritten as $= \sum_{i=1}^{n} \sum_{j=1}^{n} (x_j - \bar{x})^2 = n \sum_{j=1}^{n} (x_j - \bar{x})^2 = n \sum_{i=1}^{n} (x_i - \bar{x})^2$

where in the last step we replaced $j$ as an index with $i$. So the third term is identical to the first term.

We now turn to the second term, $-2 \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - \bar{x}) (x_j - \bar{x})$. We can bring the difference $(x_i - \bar{x})$ out of the inner sum, since it does not depend on the index $j$. This gives us $-2 \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - \bar{x}) (x_j - \bar{x}) = -2 \sum_{i=1}^{n} (x_i - \bar{x}) [\sum_{j=1}^{n} (x_j - \bar{x})]$

The sum $\sum_{j=1}^{n} (x_j - \bar{x})$ can then be rewritten as $\sum_{j=1}^{n} (x_j - \bar{x}) = \sum_{j=1}^{n} x_j - \sum_{j=1}^{n} \bar{x}$ $= \sum_{j=1}^{n} x_j - n \bar{x}$

But we have $\bar{x} = \frac{1}{n} \sum_{j=1}^{n} x_j$ by definition, so we then have $\sum_{j=1}^{n} (x_j - \bar{x}) = \sum_{j=1}^{n} x_j - n \bar{x} = n \bar{x} - n \bar{x} = 0$

We can then substitute this result into the second term as follows: $-2 \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - \bar{x}) (x_j - \bar{x}) = -2 \sum_{i=1}^{n} (x_i - \bar{x}) [\sum_{j=1}^{n} (x_j - \bar{x})]$ $= -2 \sum_{i=1}^{n} (x_i - \bar{x}) \cdot 0 = -2 \sum_{i=1}^{n} 0 = 0$

Now that we know the value of all three terms we have $\sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - x_j)^2$ $= \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - \bar{x})^2 - 2 \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - \bar{x}) (x_j - \bar{x}) + \sum_{i=1}^{n} \sum_{j=1}^{n} (x_j - \bar{x})^2$ $= n \sum_{i=1}^{n} (x_i - \bar{x})^2 + 0 + n \sum_{i=1}^{n} (x_i - \bar{x})^2$ $= 2n \sum_{i=1}^{n} (x_i - \bar{x})^2$

so that $\sum_{i=1}^{n} (x_i - \bar{x})^2 = \frac{1}{2n} \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - x_j)^2$

which is what we set out to prove.

However, we can further simplify this identity. Since $(x_i - x_j) = 0$ when $i = j$ and $(x_i - x_j)^2 = (x_j - x_i)^2$, we can consider only differences when $i < j$ (i.e., elements above the diagonal, if we consider the pairwise comparisons to form a matrix): $\sum_{i=1}^{n} (x_i - \bar{x})^2 = \frac{1}{2n} \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - x_j)^2$ $= \frac{1}{2n} [\sum_{i < j} (x_i - x_j)^2 + \sum_{i = j} (x_i - x_j)^2 + \sum_{i > j} (x_i - x_j)^2]$ $\frac{1}{2n} [\sum_{i < j} (x_i - x_j)^2 + 0 + \sum_{i < j} (x_i - x_j)^2]$ $\frac{1}{2n} [2 \sum_{i < j} (x_i - x_j)^2] = \frac{1}{n} \sum_{i < j} (x_i - x_j)^2$

From the definition of $\sigma^2$ we then have $\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2$ $= \frac{1}{n} [\frac{1}{n} \sum_{i < j} (x_i - x_j)^2]$ $= \frac{1}{n^2} \sum_{i < j} (x_i - x_j)^2$

This entry was posted in Uncategorized. Bookmark the permalink.