Appendix B. Unbiased estimates of variance from data sets with unequal underlying group means and unequal group sizes. A pdf version is also available.

** Statement of the problem **

Suppose that we have *g* groups of observations, and let *n _{j}* be the number of observations in group

where *μ _{j}* is the mean and is the variance.

So the groups’ underlying distributions have the same variance, but possibly different means.

Furthermore, we assume that the underlying group means {*μ*_{1}, *μ*_{2},…, *μ*_{g}} are drawn from a normal distribution:

where *μ* is the mean and is the variance.

Given a number of observations spread across a number of groups, we are interested in providing unbiased estimates of and , i.e., the underlying between-groups variance and the underlying within-group variance. To achieve this, it is convenient first to produce an estimate of the underlying total variance:

which represents the underlying variance between random observations in random groups.

Our variance component analysis is equivalent to a model II ANOVA for a single-factor analysis of variance ( Neter and Wasserman 1974, p.524) . This can be seen by equating *Y* to Neter and Wasserman’s *μ _{j}* and equating

** **

** Underlying total variance **

To estimate the underlying total variance we could take one observation from each group and then use as our estimate:

where* x _{ji}* is the

To produce a better estimate of total variance we can take the mean of all *P* = possible estimates of the form (B.2) given our observed data:

It will be impractical to compute (B.3) directly for all but the smallest data sets, so we simplify the equation. In the following, represents the mean of observations in group *j*, is the observed variance of the , and is the sample variance in group *j*.

Using the standard expression for variance: , equation (B.3) becomes:

Now let

and

So,

By noting that in the expression for *B* the multiple summation counts every observation *x _{ji}* a total of

Also,

So,

And then finally, noting that the expression in square brackets is just the sample variance of the *j*th group:

This equation has been verified empirically (i.e., by checking that (B.3) and (B.4) produce the same result for small randomly generated data sets).

** Within-group variance **

In the case of unequal sample sizes, we derive the best linear unbiased estimator of within-group variance as follows:

where the *c _{j}* are constants to be determined, and we are using the unbiased estimator of the variance within each group. For an unbiased estimator of , the

To obtain the best linear unbiased estimator, we must minimize the variance of :

This uses the fact that the variance of the sample variance of a normal distribution with variance *σ*^{2} is , where *n* is the sample size (Weisstein 2005*b*) .

This optimization problem can be solved using Lagrange Multipliers (Weisstein 2005*a*) . We find the extremum of *f*(*c*_{1}, *c*_{2}, …, c_{g}) = subject to the constraint *g*(*c*_{1}, *c*_{2}, …, c_{g}) = = 0:

From the constraint, we have:

Which leads to:

and

So then we have

** Underlying between-groups variance **

The estimate of the underlying between-groups variance is then given according to (B.1) by subtracting (B.5) from (B.4):

According to this formula, may be negative. This occurs when the observed group means are, by chance, under-dispersed compared to what would be expected given the observed within-groups variances. In such cases the most sensible estimate of the underlying between-groups variance is = 0.

(Note that, in the Appendix A, (B.5) is used to estimate *σ _{P}* and (B.6) is used to estimate

LITERATURE CITED

Neter , J., and W. Wasserman. 1974. Applied Linear Statistical Models. Richard D. Irwin, Inc., Homewood, Illinois, USA.2.

Weisstein , E. W. 2005*a*. "Lagrange Multiplier." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/LagrangeMultiplier.html.

Weisstein , E. W. 2005*b*. "Normal Distribution." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/NormalDistribution.html.