The variance of a set of
values
is usually expressed in terms of squared differences between those values and the mean
of those values.
However the sum of squared differences between the values and the mean can also be expressed in term of the sum of squared pairwise differences
among the values themselves, without reference to the mean
.
In particular, we want to show that
.
To get an expression involving we rewrite the squared difference in the righthand sum and then expand the result:
Since the squared difference in the first term does not depend on , the first term can be rewritten as
Since the squared difference in the third term does not depend on , the third term can be rewritten as
where in the last step we replaced as an index with
. So the third term is identical to the first term.
We now turn to the second term, . We can bring the difference
out of the inner sum, since it does not depend on the index
. This gives us
The sum can then be rewritten as
But we have by definition, so we then have
We can then substitute this result into the second term as follows:
Now that we know the value of all three terms we have
so that
which is what we set out to prove.
However, we can further simplify this identity. Since when
and
, we can consider only differences when
(i.e., elements above the diagonal, if we consider the pairwise comparisons to form a matrix):
From the definition of we then have