Where the average (or mean) is a measure of the center of a group of numbers, the variance is the measure of the spread. The following two sets of numbers have the same mean, 10.
S1 = {10, 10, 10, 10, 10}
S2 = {0, 5, 10, 15, 20}
The first set though has a variance of zero; all numbers are the same. The second set has a variance of 50.
where Sum i means to sum over all elements of set S,
N is the number of elements in S,
Si is the ith element of the set S,
and E(S) is the mean over the values of set S.
When we are dealing with a sample (that is, a subset of the complete population), we cannot of course compute the mean and variance exactly, but rather estimate them. Given a sample U with M elements Ui, i=1,2,...M, we obtain an unbiased estimate of the mean as follows:
mu = Sum i Ui / M,
while an unbiased estimate s2 of the variance is obtained from the formula
s2 = Sum i (Ui - mu)2/ (M-1).
The standard deviation is the square root of the variance.
The following distribution has low variance -- the values are clustered around the mean.
The following distribution has high variance -- the values are more spread out.
The covariance is a measure of common variance between two sets of numbers.
Given two sets, Si and Sj, the covariance is
Cov(i,j) = sumk (Sik - E(Si)) (Sjk - E(Sj)) / N,
where Sik is the kth element of set Si and N is the number of elements in each set.
Back: Introduction