Stephen J. Tulowiecki and Chris P. S. Larsen. 2015. Native American impact on past forest composition inferred from species distribution models, Chautauqua County, New York. Ecological Monographs 85:557–581. http://dx.doi.org/10.1890/14-2259.1
Supplement
R code and data files used to train and evaluate species distribution models (SDMs).
Ecological Archives M085-021-S1.
Authors
File list (downloads)
Description
Stephen J. Tulowiecki
Department of Geography
University at Buffalo
Wilkeson Quadrangle
Buffalo, NY 14261 USA
E-mail: [email protected]
Chris P. S. Larsen
Department of Geography
University at Buffalo
Wilkeson Quadrangle
Buffalo, NY 14261 USA
File list
Ecol_Monograph_supplement_code_biomod2.txt (md5: 1468e75dbf74ed624a8dce871743f924)
Ecol_Monograph_supplement_code_dismo_1.txt (md555b20fbe747f7601c53d5b56a93459ea: )
Ecol_Monograph_supplement_code_dismo_2.txt (md5: a33a1745062f1bf816c3d9ec797cdd46)
Ecol_Monograph_supplement_code_dismo_3.txt (md5: aff301c5ba52f04eff85e561122964c4)
Ecol_Monograph_supplement_code_dismo_4.txt (md5: 244ff730dbd9da02a5439cfd95a439ca)
Ecol_Monograph_supplement_code_dismo_5.txt (md5: bec6a05bf1d737b941d0a7a00bde3658)
lot_line_section_with_predictors.csv (md5: 48dc1b92e2d3d3b3e4875ef0dc3b87a7)
township_bt_post_with_predictors.csv (md5: 86f08554a0a65fec8065f85335aa8ec5)
township_line_section_with_predictors.csv (md5: d028af68dcd8f7bca5b28e969cc5c796)
biomod2_predictors.zip (md5: 7ab5a1d2ef1847fe64a47483e8220d70)
Description
This supplement contains the data and code that were used to train and evaluate species distribution models (SDMs). Included are six (6) .txt files that contain code to be run in R, and three (3) .csv files that contain the training data and evaluation data. For all files that contain code, comments are included (“#...”) to describe its functioning.
There are two notes regarding the code files in this supplement. First, users seeking to recreate the results should be aware that minor edits to the code are necessary, in order to make sure all pathnames that are referenced in the code will match the locations where the user is storing the data files. Second, the presented code is for training SDMs that include Native American variables (NAVs). A few minor edits to the code would need to be made, in order to run SDMs that exclude NAVs; these edits are documented in the comments of the code files. Both edits are minor and should take little time to make.
Also worth noting is the considerable processing time required to train and evaluate the models. While the “biomod2” code is highly-automated, it could still require several hours to a few days to run, on a personal computer. The “dismo” codes could take several days to one week to run properly; these codes also involve much more “manual” inputting of blocks of code into R. Alternatively, more advanced users of R could edit the code to function as a script and/or be more automated.
The following is a description of each individual file.
Ecol_Monograph_supplement_code_biomod2.txt – this file contains the code for training SDMs from the Holland Land Company (HLC) line-description (or “line section”) data, using three SDM algorithms from the “biomod2” package in R: Generalized Additive Models (GAMs), Generalized Linear Models (GLMs), and Multivariate Adaptive Regression Splines (MARS).
Five .txt files contain additional code for training and evaluating boosted regression tree (BRT) models, using the “dismo” package in R. The code for BRT model development was broken down into five files, which must be run in succession. Note that due to the “stochastic” nature of BRT models, slightly different model results may result, in comparison to the results reported in the article.
Ecol_Monograph_supplement_code_dismo_1.txt – this code loads the training data, and trains an initial set of BRT models.
Ecol_Monograph_supplement_code_dismo_2.txt – this code runs a procedure that suggests the number of variables that can be dropped from the initial set of BRT models.
Ecol_Monograph_supplement_code_dismo_3.txt – this code creates a set of simplified BRT models with fewer variables, as determined by the previous step.
Ecol_Monograph_supplement_code_dismo_4.txt – this code loads evaluation data, loads raster versions of predictor variables, projects models into geographic space, calculates variable importance, plots response curves, and evaluates models upon training data and evaluation data.
Ecol_Monograph_supplement_code_dismo_5.txt – this code saves false positive rates and false negative rates for each model, when evaluated upon the training data and evaluation data.
.csv files – these files contain the training data and evaluation data:
lot_line_section_with_predictors.csv – this file contains the line-description data that was used to train SDMs.
township_bt_post_with_predictors.csv – this file contains the township bearing-tree data, which was used to evaluate SDMs.
township_line_section_with_predictors.csv – this file contains the township line-description data, which was used to evaluate SDMs.
The township data above were used with the permission of Dr. Yi-Chen Wang. For more information regarding these datasets, see:
Wang, Y.-C. 2007. Spatial patterns and vegetation-site relationships of the presettlement forests in western New York, USA. Journal of Biogeography 34:500513.
Tulowiecki, S. J., C. P. S. Larsen, and Y.-C. Wang. 2014. Effects of positional error on modeling species distributions: a perspective using presettlement land survey records. Plant Ecology 216:6785.
The following table contains descriptions of the columns, and checksum values, for the .csv files (sorted alphabetically by column name). With the exception of the “weights” columns, the three .csv files share the same column names (but obviously with different values). The evaluation data (“township_bt_post_with_ predictors.csv” and “township_line_section_with_predictors.csv”) do not contain case weight columns, because case weights were only used when training models using the training data (“lot_line_section_with_ predictors.csv”). There are no blank cell values in these .csv files.
Column name |
Description |
Checksum values for lot_line_section_with_predictors.csv |
Checksum values for township_bt_post_with_predictors.csv |
Checksum values for township_line_section_with_predictors.csv |
abi.bal |
Balsam fir presence (1) and absence (0) |
51.0 |
2.0 |
3.0 |
abi.bal.weights |
Case weights for balsam fir |
102.0 |
n/a |
n/a |
ace.rub |
Red maple presence (1) and absence (0) |
95.0 |
35.0 |
51.0 |
ace.rub.weights |
Case weights for red maple |
190.0 |
n/a |
n/a |
ace.sac |
Sugar maple presence (1) and absence (0) |
3762.0 |
256.0 |
363.0 |
ace.sac.weights |
Case weights for sugar maple |
4142.0 |
n/a |
n/a |
aetoverpet07 |
AET/PET, July |
5810.5 |
740.7 |
647.4 |
aln.inc |
Alder presence (1) and absence (0) |
121.0 |
0.0 |
11.0 |
aln.inc.weights |
Case weights for alder |
242.0 |
n/a |
n/a |
bet.all |
Yellow birch presence (1) and absence (0) |
170.0 |
55.0 |
147.0 |
bet.all.weights |
Case weights for yellow birch |
340.0 |
n/a |
n/a |
car.spp |
Hickory presence (1) and absence (0) |
226.0 |
12.0 |
57.0 |
car.spp.weights |
Case weights for hickory |
452.0 |
n/a |
n/a |
cas.den |
Chestnut presence (1) and absence (0) |
572.0 |
29.0 |
77.0 |
cas.den.weights |
Case weights for chestnut |
1144.0 |
n/a |
n/a |
cti |
Compound topographic index |
55756.1 |
6894.0 |
6349.8 |
drainclass |
Soil drainage class |
26922.6 |
3402.4 |
3006.1 |
fag.gra |
Beech presence (1) and absence (0) |
4021.0 |
461.0 |
443.0 |
fag.gra.weights |
Case weights for beech |
3624.0 |
n/a |
n/a |
fra.ame |
White ash presence (1) and absence (0) |
870.0 |
50.0 |
157.0 |
fra.ame.weights |
Case weights for white ash |
1740.0 |
n/a |
n/a |
fra.nig |
Black ash presence (1) and absence (0) |
530.0 |
35.0 |
74.0 |
fra.nig.weights |
Case weights for black ash |
1060.0 |
n/a |
n/a |
historicvill |
Accessibility to a Historic village (kcal) |
4573136.4 |
586073.5 |
520687.9 |
jug.cin |
Butternut presence (1) and absence (0) |
244.0 |
9.0 |
38.0 |
jug.cin.weights |
Case weights for butternut |
488.0 |
n/a |
n/a |
jug.nig |
Black walnut presence (1) and absence (0) |
31.0 |
2.0 |
11.0 |
jug.nig.weights |
Case weights for black walnut |
62.0 |
n/a |
n/a |
lir.tul |
Whitewood presence (1) and absence (0) |
201.0 |
0.0 |
1.0 |
lir.tul.weights |
Case weights for whitewood |
402.0 |
n/a |
n/a |
lwvill |
Accessibility to a Late Woodland village (kcal) |
1771154.4 |
236266.1 |
206470.8 |
mag.acu |
Cucumber magnolia presence (1) and absence (0) |
658.0 |
30.0 |
174.0 |
mag.acu.weights |
Case weights for cucumber magnolia |
1316.0 |
n/a |
n/a |
ost.vir |
Ironwood presence (1) and absence (0) |
44.0 |
24.0 |
18.0 |
ost.vir.weights |
Case weights for ironwood |
88.0 |
n/a |
n/a |
percentclay |
Soil percent clay |
101788.0 |
13476.9 |
11176.0 |
percentsand |
Soil percent sand |
191747.1 |
23550.5 |
21295.4 |
ph |
Soil pH |
34237.8 |
4308.8 |
3807.9 |
pin.str |
White pine presence (1) and absence (0) |
561.0 |
39.0 |
98.0 |
pin.str.weights |
Case weights for white pine |
1122.0 |
n/a |
n/a |
pla.occ |
Sycamore presence (1) and absence (0) |
55.0 |
4.0 |
12.0 |
pla.occ.weights |
Case weights for sycamore |
110.0 |
n/a |
n/a |
precipgs |
Total precipitation, May through September (mm) |
3208498.2 |
410422.7 |
360471.7 |
pru.ser |
Black cherry presence (1) and absence (0) |
160.0 |
13.0 |
86.0 |
pru.ser.weights |
Case weights for black cherry |
320.0 |
n/a |
n/a |
que.alb |
White oak presence (1) and absence (0) |
394.0 |
27.0 |
43.0 |
que.alb.weights |
Case weights for white oak |
788.0 |
n/a |
n/a |
que.vel |
Black oak presence (1) and absence (0) |
498.0 |
12.0 |
53.0 |
que.vel.weights |
Case weights for black oak |
996.0 |
n/a |
n/a |
slopedegrees |
Slope angle (degrees) |
15230.7 |
2122.2 |
1733.1 |
solradgs |
Total solar radiation, May through September (Wh m-2) |
4554780790.2 |
581063222.8 |
507667623.9 |
temp01avg |
Mean January temperature (deg C) |
-28591.1 |
-3720.9 |
-3273.4 |
tempavggs |
Mean temperature, May through September (deg C) |
99889.3 |
12639.9 |
11019.6 |
til.ame |
Basswood presence (1) and absence (0) |
3004.0 |
53.0 |
243.0 |
til.ame.weights |
Case weights for basswood |
5658.0 |
n/a |
n/a |
trails |
Accessibility to a trail (kcal) |
1741472.7 |
234840.2 |
231893.1 |
tsu.can |
Hemlock presence (1) and absence (0) |
2176.0 |
136.0 |
326.0 |
tsu.can.weights |
Case weights for hemlock |
4352.0 |
n/a |
n/a |
ulm.ame |
Elm presence (1) and absence (0) |
908.0 |
32.0 |
143.0 |
ulm.ame.weights |
Case weights for elm |
1816.0 |
n/a |
n/a |
x_coord |
x-coordinates of sample points (UTM Zone 17N) |
3704886988.2 |
471939617.9 |
410963504.5 |
y_coord |
y-coordinates of sample points (UTM Zone 17N) |
27282741631.9 |
3473135853.0 |
3031852492.1 |
biomod2_predictors.zip – this zipped file contains the predictor variables in raster format (coordinate system: UTM Zone 17N) that were used to project SDMs into geographic space, in order to train SDMs and create prediction surfaces.
ESA Publications | Ecological Archives | Permissions | Citation | Contacts