Ecological Archives A017-013-A2

Ryan A. Chisholm and Brendan A. Wintle. 2007. Incorporating landscape stochasticity into population viability analysis. Ecological Applications 17:317–322.

Appendix B. Unbiased estimates of variance from data sets with unequal underlying group means and unequal group sizes. A pdf version is also available.

Statement of the problem

Suppose that we have g groups of observations, and let nj be the number of observations in group j. We assume that the observations in group j are drawn from a normal distribution:

where μj is the mean and is the variance.

So the groups’ underlying distributions have the same variance, but possibly different means.

Furthermore, we assume that the underlying group means {μ1, μ2,…, μg} are drawn from a normal distribution:

where μ is the mean and is the variance.

Given a number of observations spread across a number of groups, we are interested in providing unbiased estimates of and , i.e., the underlying between-groups variance and the underlying within-group variance. To achieve this, it is convenient first to produce an estimate of the underlying total variance:

(B. 1)

which represents the underlying variance between random observations in random groups.

Our variance component analysis is equivalent to a model II ANOVA for a single-factor analysis of variance ( Neter and Wasserman 1974, p.524) . This can be seen by equating Y to Neter and Wasserman’s μj and equating Xj to their εij. Our estimates for the variances derived below are consisted with theirs, but our estimate for takes a different form because it is derived from a different approach.

Underlying total variance

To estimate the underlying total variance we could take one observation from each group and then use as our estimate:

(B. 2)

where xji is the ith observation in group j, ij is a random index drawn form a discrete uniform distribution U(1, nj)­, and the g/(g-1) factor is necessary for an unbiased estimate.

To produce a better estimate of total variance we can take the mean of all P = possible estimates of the form (B.2) given our observed data:

(A.2. 3)

It will be impractical to compute (B.3) directly for all but the smallest data sets, so we simplify the equation. In the following, represents the mean of observations in group j, is the observed variance of the , and is the sample variance in group j.

Using the standard expression for variance: , equation (B.3) becomes:

Now let

and

So,

By noting that in the expression for B the multiple summation counts every observation xji a total of P/ nj times, B can be re-expressed as follows:

Also,

So,

And then finally, noting that the expression in square brackets is just the sample variance of the jth group:

(B.4)

This equation has been verified empirically (i.e., by checking that (B.3) and (B.4) produce the same result for small randomly generated data sets).

Within-group variance

In the case of unequal sample sizes, we derive the best linear unbiased estimator of within-group variance as follows:

where the cj are constants to be determined, and we are using the unbiased estimator of the variance within each group. For an unbiased estimator of , the cj are subject to the constraint:

To obtain the best linear unbiased estimator, we must minimize the variance of :

This uses the fact that the variance of the sample variance of a normal distribution with variance σ2 is , where n is the sample size (Weisstein 2005b) .

This optimization problem can be solved using Lagrange Multipliers (Weisstein 2005a) . We find the extremum of f(c1, c2, …, cg) = subject to the constraint g(c1, c2, …, cg) = = 0:

From the constraint, we have:

Which leads to:

and

So then we have

(B.5)

Underlying between-groups variance

The estimate of the underlying between-groups variance is then given according to (B.1) by subtracting (B.5) from (B.4):

(B.6)

According to this formula, may be negative. This occurs when the observed group means are, by chance, under-dispersed compared to what would be expected given the observed within-groups variances. In such cases the most sensible estimate of the underlying between-groups variance is = 0.

(Note that, in the Appendix A, (B.5) is used to estimate σP and (B.6) is used to estimate σL.)

LITERATURE CITED

Neter , J., and W. Wasserman. 1974. Applied Linear Statistical Models. Richard D. Irwin, Inc., Homewood, Illinois, USA.2.

Weisstein , E. W. 2005a. "Lagrange Multiplier." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/LagrangeMultiplier.html.

Weisstein , E. W. 2005b. "Normal Distribution." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/NormalDistribution.html.



[Back to A017-013]