Appendix A. Details on model validation and pesticide analyses.

*Model validation*

To validate the final generalized additive model (GAM) (built on the model development data set), we used the final model to generate predicted probabilities of *Rana muscosa* occurrence for all water bodies in the test data set. We then used goodness-of-fit measures, including Nagelkerke-Cox *R*^{2} and Briar score to compare model performance on the test data set vs. the model development data set. Substantially poorer model performance on the test data set than on the model development data set would indicate a lack of model validity beyond the data used in model development. We also used two tests applied to model output resulting from analysis of the test data set to evaluate two separate components of goodness-of-fit: the degree of miscalibration and discrimination. Miscalibration is a measure of the deviation between the average predicted and observed probabilities of occurrence, and discrimination measures a model’s ability to distinguish between present and absent sites (Miller et al. 1991).

The goodness-of-fit measures indicated that the regression model performed similarly on the model development and test data sets. The Nagelkerke-Cox *R*^{2} measure and the Brier score for the main and test data sets were 0.38 vs. 0.35 and 0.08 vs. 0.08, respectively. In addition, there was no indication of miscalibration or lack of discrimination based on the chi-square tests (miscalibration Chi-square = 1.1, df = 1, *P* = 0.29, discrimination Chi-square = 1.6, df = 2, *P* = 0.45). Taken together these results indicate the model is not overfit, and has generality beyond the model development data set.

*Models with alternative pesticide definitions*

We built five variants of our reduced total pesticides GAM to explore different definitions of upwind pesticide use. We constructed models in which (1) pesticide applications were not weighted by distance, (2) outlier pesticide applications were included, (3) winter pesticide applications were included, or (4) upwind pesticide use was calculated using larger and smaller upwind triangles (100 and 150 km). We also constructed models in which we accounted for less spatial autocorrelation by increasing the “span” of the smoothing function for the spatial location variable from 0.1 to 0.2 or 0.3.

In general, the variants of the upwind pesticide variable produced similar models. The model we presented had the lowest AIC value, but the differences in model AIC between the variants were all <1%. Increasing the span of the spatial location variable increased model AIC values by up to 7% and thereby increased the strength of the upwind pesticide variable, but did not otherwise substantially affect the models. A 150-km long upwind triangle was too long, extending out into portions of the Central Valley that were not upwind of the water bodies. A 100-km triangle produced similar results to the 125-km triangle used in our main model but was more highly correlated with the spatial location variable and produced an unstable regression model.

*Methods for analysis of pesticides classes, pesticide groups, and individual pesticides*

From 1991 to 2000, 761 different pesticide active ingredients were reported as used upwind of our study sites. However, many of these pesticides were used in relatively small amounts, in only a few locations, or only in a few years. If a pesticide was not used or only little used upwind of where *R. muscosa* declined, then it is not biologically plausible that the pesticide was contributing to declines. To identify pesticides classes plausibly associated with declines, total upwind pesticide use by class was calculated for all water bodies where *R. muscosa* was absent. Only pesticide classes meeting the following minimum use criteria (calculated for the period, 1991 to 2000) were included in subsequent statistical analysis: (1) total upwind use >10000 lbs (4536 kg), (2) used in at least two of the 13 total counties intersected by upwind triangles, and (3) for those counties where it was used, the average duration of use was ≥2 years. The minimum use criteria reduced the number of pesticides classes for which we built separate GAMs from 96 to 63, but included pesticide classes that accounted for 99.9% of total upwind pesticide use by weight. To select individual pesticides for analysis, we used the same approach as with pesticide classes, but used a higher minimum use criterion of 30000 lbs (13608 kg). Application of these minimum use criteria reduced the number of individual pesticides from 761 to 160, and these pesticides accounted for 99.2% of total upwind pesticide use by weight. We also calculated upwind use for seven pesticide groups such as cholinesterase inhibitors or suspected endocrine disrupters (Davidson 2004). The small number of groups meant we did not need to apply minimum use criteria to reduce the number of variables for analysis.

Identifying which pesticide classes, pesticide groups, and individual pesticides were most strongly associated with the presence/absence of *R. muscosa* required a different analysis approach than we used with the total pesticides model. This was due to the difficulties inherent in predictor variable selection when many of these variables were highly correlated. For these analyses, we constructed separate GAMs for each of the seven pesticide groups and all the pesticide classes and individual pesticides meeting the minimum use criteria described above. Each model retained all the significant covariates from the total pesticides model, but the total upwind pesticides variable was replaced by a variable representing a single pesticide class, pesticide group, or individual pesticide. All resulting models were examined to identify pesticide classes, groups, or individual pesticides that produced models with an Akaike Information Criteria (AIC) value less than that of the total pesticides model. AIC is calculated as the sum of the model deviance residuals plus two times the number of degrees of freedom used to calculate the model parameters. AIC is thus a measure of how well the model fits the data, with a penalty for model complexity (i.e., degrees of freedom used in calculating the model parameters) (Harrell 2001, Burnham and Anderson 2002). The goal was not to select the single pesticide class, pesticide group, or individual pesticide that produced the “best” model (lowest AIC value), but rather to identify the set of pesticide classes, pesticide groups, or individual pesticides that best fit the data and produced a better fit than a total pesticides model.

LITERATURE CITED

Burnham, K. P., and D. R. Anderson. 2002. Model selection and multimodel inference: a practical information-theoretic approach. Second Edition. Springer, New York, New York, USA.

Davidson, C. 2004. Declining downwind: Amphibian population declines in California and historic pesticide use. Ecological Applications **14**:1892–1902.

Harrell, F. E. 2001. Regression modeling strategies: with applications to linear models, logistic regression and survival analysis. Springer, New York, New York, USA.

Miller, M. E., S. I. Hui, and W. M. Tierney. 1991. Validation techniques for logistic regression models. Statistics in Medicine **10**:1213–1226.