Variance and the sum of squared pairwise differences

The variance \sigma^2 of a set of n values x_1, x_2, ..., x_n is usually expressed in terms of squared differences between those values and the mean \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i of those values.

\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2

However the sum of squared differences (x_i - \bar{x})^2 between the values and the mean can also be expressed in term of the sum of squared pairwise differences (x_i - x_j)^2 among the values themselves, without reference to the mean \bar{x}.

In particular, we want to show that

\sum_{i=1}^{n} (x_i - \bar{x})^2 = \frac{1}{2n} \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - x_j)^2.

To get an expression involving \bar{x} we rewrite the squared difference in the righthand sum and then expand the result:

\sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - x_j)^2 = \sum_{i=1}^{n} \sum_{j=1}^{n} [(x_i - \bar{x}) - (x_j - \bar{x})]^2

= \sum_{i=1}^{n} \sum_{j=1}^{n} [(x_i - \bar{x})^2 - 2 (x_i - \bar{x}) (x_j - \bar{x}) + (x_j - \bar{x})^2]

= \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - \bar{x})^2 - 2 \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - \bar{x}) (x_j - \bar{x}) + \sum_{i=1}^{n} \sum_{j=1}^{n} (x_j - \bar{x})^2

Since the squared difference in the first term does not depend on j, the first term can be rewritten as

\sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - \bar{x})^2 = \sum_{i=1}^{n} n (x_i - \bar{x})^2 = n \sum_{i=1}^{n} (x_i - \bar{x})^2

Since the squared difference in the third term does not depend on i, the third term can be rewritten as

= \sum_{i=1}^{n} \sum_{j=1}^{n} (x_j - \bar{x})^2 = n \sum_{j=1}^{n} (x_j - \bar{x})^2 = n \sum_{i=1}^{n} (x_i - \bar{x})^2

where in the last step we replaced j as an index with i. So the third term is identical to the first term.

We now turn to the second term, -2 \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - \bar{x}) (x_j - \bar{x}). We can bring the difference (x_i - \bar{x}) out of the inner sum, since it does not depend on the index j. This gives us

-2 \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - \bar{x}) (x_j - \bar{x}) = -2 \sum_{i=1}^{n} (x_i - \bar{x}) [\sum_{j=1}^{n} (x_j - \bar{x})]

The sum \sum_{j=1}^{n} (x_j - \bar{x}) can then be rewritten as

\sum_{j=1}^{n} (x_j - \bar{x}) = \sum_{j=1}^{n} x_j - \sum_{j=1}^{n} \bar{x}

= \sum_{j=1}^{n} x_j - n \bar{x}

But we have \bar{x} = \frac{1}{n} \sum_{j=1}^{n} x_j by definition, so we then have

\sum_{j=1}^{n} (x_j - \bar{x}) = \sum_{j=1}^{n} x_j - n \bar{x} = n \bar{x} - n \bar{x} = 0

We can then substitute this result into the second term as follows:

-2 \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - \bar{x}) (x_j - \bar{x}) = -2 \sum_{i=1}^{n} (x_i - \bar{x}) [\sum_{j=1}^{n} (x_j - \bar{x})]

= -2 \sum_{i=1}^{n} (x_i - \bar{x}) \cdot 0 = -2 \sum_{i=1}^{n} 0 = 0

Now that we know the value of all three terms we have

\sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - x_j)^2

= \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - \bar{x})^2 - 2 \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - \bar{x}) (x_j - \bar{x}) + \sum_{i=1}^{n} \sum_{j=1}^{n} (x_j - \bar{x})^2

= n \sum_{i=1}^{n} (x_i - \bar{x})^2 + 0 + n \sum_{i=1}^{n} (x_i - \bar{x})^2

= 2n \sum_{i=1}^{n} (x_i - \bar{x})^2

so that

\sum_{i=1}^{n} (x_i - \bar{x})^2 = \frac{1}{2n} \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - x_j)^2

which is what we set out to prove.

However, we can further simplify this identity. Since (x_i - x_j) = 0 when i = j and (x_i - x_j)^2 = (x_j - x_i)^2, we can consider only differences when i < j (i.e., elements above the diagonal, if we consider the pairwise comparisons to form a matrix):

\sum_{i=1}^{n} (x_i - \bar{x})^2 = \frac{1}{2n} \sum_{i=1}^{n} \sum_{j=1}^{n} (x_i - x_j)^2

= \frac{1}{2n} [\sum_{i < j} (x_i - x_j)^2 + \sum_{i = j} (x_i - x_j)^2 + \sum_{i > j} (x_i - x_j)^2]

\frac{1}{2n} [\sum_{i < j} (x_i - x_j)^2 + 0 + \sum_{i < j} (x_i - x_j)^2]

\frac{1}{2n} [2 \sum_{i < j} (x_i - x_j)^2] = \frac{1}{n} \sum_{i < j} (x_i - x_j)^2

From the definition of \sigma^2 we then have

\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2

= \frac{1}{n} [\frac{1}{n} \sum_{i < j} (x_i - x_j)^2]

= \frac{1}{n^2} \sum_{i < j} (x_i - x_j)^2

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s