*Ecological Archives* A025-109-A2

Geoffrey A. Fricker, Jeffrey A. Wolf, Sassan S. Saatchi, and Thomas W. Gillespie. 2015. Predicting spatial variations of tree species richness in tropical forests from high-resolution remote sensing. *Ecological Applications* 25:1776–1789. http://dx.doi.org/10.1890/14-1593.1

Appendix B. Statistical analysis: Generalized Least Squares and cross-validation.

Non-spatial Ordinary Least Squares (OLS) regression was used for modeling tree species richness across the landscape, however we also performed Generalized Least Squares (GLS) spatial regression models to account for spatial autocorrelation. Predictive models were calculated in R using both OLS and GLS regression to observe the effects of non-spatial and spatial models. OLS modeled residual errors may be spatially autocorrelated and as such, a spatial (GLS) model is necessary to determine the effect of spatial autocorrelation in our predictions across the landscape. The purpose of the GLS modeling is to test spatial predictions against the OLS non-spatial predictions to determine whether an OLS model is inappropriate (i.e., a change in coefficient sign). If a predictor variable has a significantly different spatial and non-spatial prediction, it is not considered to be a good predictor. For the GLS models, we tested three ways of fitting a parametric correlation function (Gaussian, Spherical, Exponential) to the residual co-variance matrix. We selected the model with the lowest Akaike information criterion (AIC) score for each remote sensing predictor variable. We created variograms for each variable and calculated ‘Moran’s I’ to measure the spatial auto-correlation using an inverse distance weighted residual error matrix. We then used multi-modeled inference to determine the optimal model using different combinations of the four remote sensing variables. Finally we cross-validated our models using 5-fold cross validation to train and test the data. We used the ‘dredge’ function in the multi-model inference library to judge the general importance of predictor variables and cross validate our results. The following list contains the statistical package in R associated with each stage of analysis: GLS (nlme), Variogram (gstat), Moran’s I (ape), Multi-model Inference (MuMIn).