Ecological Archives E092-001-A6

David I. Warton and Francis K. C. Hui. 2011. The arcsine is asinine: the analysis of proportions in ecology. Ecology 92:3–10.

Appendix F. Results of simulations using resampling based hypothesis testing to control Type I error in small samples.

In Appendix C, it was often the case (especially for overdispersed data) that GLM and GLMM had inflated Type I error for small samples (N = 3,6), and that linear models had overly conservative Type I error. Here we present some code to use resampling to correct for this, and repeat the simulations of Appendices C-D in small samples using this approach.

The resampling approach we propose uses the bootstrap (Davison & Hinkley 1997) to resample binomial counts with replacement, while keeping the explanatory variables fixed. This approach is valid for designs involving one explanatory variable (whether continuous or a factor), when testing for an effect of this variable. Modifications involving residual resampling are necessary for more general designs (Davison & Hinkley, Section 7.2).

Some R code that can be used to estimate P values using the bootstrap can be found as a supplement on Ecological Archives.

We applied the proposed approach for the balanced design simulations originally presented in Figure 3, for N = 3 and N = 6. This was computationally intensive because of the level of repetition involved there were 16 simulations which each involved applying the proposed bootstrap test (with 1000 resamples) to 1000 different datasets, i.e., 16 million pairs of GLMM test statistics were calculated, together with the four other test statistics calculated in Appendices C-D. Total computation time on a 2.6GHz laptop was 200 hours.

The results of Type I Error simulations demonstrate that all methods were able to maintain Type I error rates close to the nominal 0.05 level (Table F1, Table F2), as expected. Not only does this demonstrate that bootstrapping is a simple strategy for controlling Type I error, but it also allows us to compare the power of the different statistics solely on the grounds of the efficiency of the different statistics, rather than this being confounded with differences in Type I error.

Power simulations (Table F3, Table F4) showed that in general GLM and GLMM maintained a power advantage over the linear modelling approaches, although the extent of this advantage was considerably smaller than in Appendix D, and there were even some cases (for N = 3) where a linear modelling statistic had slightly higher power than one of the four GLM statistics (e.g., Table F3, first line, GLM-likelihood ratio). However, by N = 6, all GLM/GLMM statistics had higher power than both linear modelling statistics. It is also interesting to note that once the severe Type I error problems previously seen in the Wald statistics (Appendix C) had been corrected for, Wald statistics appeared to have higher power than all other statistics, sometimes considerably so.

The small sample size results given below have been combined with the large sample results from Appendices C-D in Figure F1, in order to visualise the overall power trends across statistics once Type I error has been adequately controlled for. It is evident that the GLM/GLMM statistics dominate, although the power advantages of these methods tend to be in the order of 5-25% rather than the almost two-fold differences in power sometimes seen in Figure 3.

TABLE F1. Balanced Design Type I Error simulations, for binomial data with no overdispersion, using the bootstrap to estimate statistical significance.

(p1,p2 ) Sample Size - Raw Arcsine Logistic - Logistic - GLMM - GLMM -
Per Group Wald Lik-ratio Wald Lik-ratio
(0.1,0.1) 3 0.041 0.046 0.047 0.033 0.068 0.040
6 0.052 0.055 0.057 0.046 0.057 0.054
(0.5,0.5) 3 0.045 0.044 0.044 0.044 0.053 0.052
6 0.055 0.057 0.055 0.055 0.066 0.060

TABLE F2. Balanced Design Type I Error simulations, for overdispersed binomial data, using the bootstrap to estimate statistical significance.

(p1,p2 ) Sample Size - Raw Arcsine Logistic - Logistic - GLMM - GLMM -
Per Group Wald Lik-ratio Wald Lik-ratio
(0.1,0.1) 3 0.036 0.042 0.040 0.037 0.054 0.047
6 0.042 0.050 0.049 0.044 0.064 0.051
(0.5,0.5) 3 0.067 0.054 0.063 0.053 0.085 0.077
6 0.065 0.063 0.067 0.066 0.070 0.066

TABLE F3. Power simulations, for binomial data with no overdispersion, using the bootstrap to estimate statistical significance.

(p1,p2 ) Sample Size - Raw Arcsine Logistic - Logistic - GLMM - GLMM -
Per Group Wald Lik-ratio Wald Lik-ratio
(0.1,0.3) 3 0.385 0.359 0.397 0.333 0.489 0.418
6 0.786 0.799 0.811 0.806 0.847 0.834
(0.3,0.5) 3 0.288 0.279 0.279 0.266 0.359 0.326
6 0.598 0.598 0.611 0.607 0.647 0.646

TABLE F4. Power simulations, for overdispersed binomial data, using the bootstrap to estimate statistical significance.

(p1,p2 ) Sample Size - Raw Arcsine Logistic - Logistic - GLMM - GLMM -
Per Group Wald Lik-ratio Wald Lik-ratio
(0.1,0.3) 3 0.168 0.172 0.176 0.158 0.238 0.186
6 0.398 0.398 0.412 0.409 0.425 0.415
(0.3,0.5) 3 0.137 0.123 0.141 0.121 0.185 0.153
6 0.223 0.215 0.237 0.236 0.235 0.229


FIG. F1. Summary of simulations results (for balanced designs only) from (a) Type I error simulations; (b) power simulations, using bootstrapping for N = 3 and N = 6 in order to provide adequate control of Type I error. Simulations considered both binomial data without overdispersion (left) and with overdispersion (right). As in Figure 3, results are reported for tests using a linear fit to untransformed proportions (circle), a linear fit to arcsine proportions (triangle), logistic regression (+) and GLMM (x) likelihood ratio tests. Note the GLM and GLMM tend to have a small but consistent power advantage as compared to untransformed and arcsine transformed linear modelling methods.

LITERATURE CITED

Davison, A. C., and D. V. Hinkley. 1997. Bootstrap methods and their application. Cambridge University Press, Cambridge, UK.


[Back to E092-001]