Stephen J. Tulowiecki and Chris P. S. Larsen. 2015. Native American impact on past forest composition inferred from species distribution models, Chautauqua County, New York. Ecological Monographs 85:557–581. http://dx.doi.org/10.1890/14-2259.1


Supplement

R code and data files used to train and evaluate species distribution models (SDMs).
Ecological Archives M085-021-S1.

Copyright


Authors
File list (downloads)
Description


Author(s)

Stephen J. Tulowiecki
Department of Geography
University at Buffalo
Wilkeson Quadrangle
Buffalo, NY 14261 USA
E-mail: [email protected]

Chris P. S. Larsen
Department of Geography
University at Buffalo
Wilkeson Quadrangle
Buffalo, NY 14261 USA


File list

Ecol_Monograph_supplement_code_biomod2.txt (md5: 1468e75dbf74ed624a8dce871743f924)

Ecol_Monograph_supplement_code_dismo_1.txt (md555b20fbe747f7601c53d5b56a93459ea: )

Ecol_Monograph_supplement_code_dismo_2.txt (md5: a33a1745062f1bf816c3d9ec797cdd46)

Ecol_Monograph_supplement_code_dismo_3.txt (md5: aff301c5ba52f04eff85e561122964c4)

Ecol_Monograph_supplement_code_dismo_4.txt (md5: 244ff730dbd9da02a5439cfd95a439ca)

Ecol_Monograph_supplement_code_dismo_5.txt (md5: bec6a05bf1d737b941d0a7a00bde3658)

lot_line_section_with_predictors.csv (md5: 48dc1b92e2d3d3b3e4875ef0dc3b87a7)

township_bt_post_with_predictors.csv (md5: 86f08554a0a65fec8065f85335aa8ec5)

township_line_section_with_predictors.csv (md5: d028af68dcd8f7bca5b28e969cc5c796)

biomod2_predictors.zip (md5: 7ab5a1d2ef1847fe64a47483e8220d70)

Description

This supplement contains the data and code that were used to train and evaluate species distribution models (SDMs). Included are six (6) .txt files that contain code to be run in R, and three (3) .csv files that contain the training data and evaluation data. For all files that contain code, comments are included (“#...”) to describe its functioning.

There are two notes regarding the code files in this supplement. First, users seeking to recreate the results should be aware that minor edits to the code are necessary, in order to make sure all pathnames that are referenced in the code will match the locations where the user is storing the data files. Second, the presented code is for training SDMs that include Native American variables (NAVs). A few minor edits to the code would need to be made, in order to run SDMs that exclude NAVs; these edits are documented in the comments of the code files. Both edits are minor and should take little time to make.

Also worth noting is the considerable processing time required to train and evaluate the models. While the “biomod2” code is highly-automated, it could still require several hours to a few days to run, on a personal computer. The “dismo” codes could take several days to one week to run properly; these codes also involve much more “manual” inputting of blocks of code into R. Alternatively, more advanced users of R could edit the code to function as a script and/or be more automated.

The following is a description of each individual file.

Ecol_Monograph_supplement_code_biomod2.txt – this file contains the code for training SDMs from the Holland Land Company (HLC) line-description (or “line section”) data, using three SDM algorithms from the “biomod2” package in R: Generalized Additive Models (GAMs), Generalized Linear Models (GLMs), and Multivariate Adaptive Regression Splines (MARS).

Five .txt files contain additional code for training and evaluating boosted regression tree (BRT) models, using the “dismo” package in R. The code for BRT model development was broken down into five files, which must be run in succession. Note that due to the “stochastic” nature of BRT models, slightly different model results may result, in comparison to the results reported in the article.

Ecol_Monograph_supplement_code_dismo_1.txt – this code loads the training data, and trains an initial set of BRT models.

Ecol_Monograph_supplement_code_dismo_2.txt – this code runs a procedure that suggests the number of variables that can be dropped from the initial set of BRT models.

Ecol_Monograph_supplement_code_dismo_3.txt – this code creates a set of simplified BRT models with fewer variables, as determined by the previous step.

Ecol_Monograph_supplement_code_dismo_4.txt – this code loads evaluation data, loads raster versions of predictor variables, projects models into geographic space, calculates variable importance, plots response curves, and evaluates models upon training data and evaluation data.

Ecol_Monograph_supplement_code_dismo_5.txt – this code saves false positive rates and false negative rates for each model, when evaluated upon the training data and evaluation data.

.csv files – these files contain the training data and evaluation data:

lot_line_section_with_predictors.csv – this file contains the line-description data that was used to train SDMs.

township_bt_post_with_predictors.csv – this file contains the township bearing-tree data, which was used to evaluate SDMs.

township_line_section_with_predictors.csv – this file contains the township line-description data, which was used to evaluate SDMs.

The township data above were used with the permission of Dr. Yi-Chen Wang. For more information regarding these datasets, see:

Wang, Y.-C. 2007. Spatial patterns and vegetation-site relationships of the presettlement forests in western New York, USA. Journal of Biogeography 34:500–513.

Tulowiecki, S. J., C. P. S. Larsen, and Y.-C. Wang. 2014. Effects of positional error on modeling species distributions: a perspective using presettlement land survey records. Plant Ecology 216:67–85.

The following table contains descriptions of the columns, and checksum values, for the .csv files (sorted alphabetically by column name). With the exception of the “weights” columns, the three .csv files share the same column names (but obviously with different values). The evaluation data (“township_bt_post_with_ predictors.csv” and “township_line_section_with_predictors.csv”) do not contain case weight columns, because case weights were only used when training models using the training data (“lot_line_section_with_ predictors.csv”). There are no blank cell values in these .csv files.

Column name

Description

Checksum values for

lot_line_section_with_predictors.csv

Checksum values for

township_bt_post_with_predictors.csv

Checksum values for

township_line_section_with_predictors.csv

abi.bal

Balsam fir presence (1) and absence (0)

51.0

2.0

3.0

abi.bal.weights

Case weights for balsam fir

102.0

n/a

n/a

ace.rub

Red maple presence (1) and absence (0)

95.0

35.0

51.0

ace.rub.weights

Case weights for red maple

190.0

n/a

n/a

ace.sac

Sugar maple presence (1) and absence (0)

3762.0

256.0

363.0

ace.sac.weights

Case weights for sugar maple

4142.0

n/a

n/a

aetoverpet07

AET/PET, July

5810.5

740.7

647.4

aln.inc

Alder presence (1) and absence (0)

121.0

0.0

11.0

aln.inc.weights

Case weights for alder

242.0

n/a

n/a

bet.all

Yellow birch presence (1) and absence (0)

170.0

55.0

147.0

bet.all.weights

Case weights for yellow birch

340.0

n/a

n/a

car.spp

Hickory presence (1) and absence (0)

226.0

12.0

57.0

car.spp.weights

Case weights for hickory

452.0

n/a

n/a

cas.den

Chestnut presence (1) and absence (0)

572.0

29.0

77.0

cas.den.weights

Case weights for chestnut

1144.0

n/a

n/a

cti

Compound topographic index

55756.1

6894.0

6349.8

drainclass

Soil drainage class

26922.6

3402.4

3006.1

fag.gra

Beech presence (1) and absence (0)

4021.0

461.0

443.0

fag.gra.weights

Case weights for beech

3624.0

n/a

n/a

fra.ame

White ash presence (1) and absence (0)

870.0

50.0

157.0

fra.ame.weights

Case weights for white ash

1740.0

n/a

n/a

fra.nig

Black ash presence (1) and absence (0)

530.0

35.0

74.0

fra.nig.weights

Case weights for black ash

1060.0

n/a

n/a

historicvill

Accessibility to a Historic village (kcal)

4573136.4

586073.5

520687.9

jug.cin

Butternut presence (1) and absence (0)

244.0

9.0

38.0

jug.cin.weights

Case weights for butternut

488.0

n/a

n/a

jug.nig

Black walnut presence (1) and absence (0)

31.0

2.0

11.0

jug.nig.weights

Case weights for black walnut

62.0

n/a

n/a

lir.tul

Whitewood presence (1) and absence (0)

201.0

0.0

1.0

lir.tul.weights

Case weights for whitewood

402.0

n/a

n/a

lwvill

Accessibility to a Late Woodland village (kcal)

1771154.4

236266.1

206470.8

mag.acu

Cucumber magnolia presence (1) and absence (0)

658.0

30.0

174.0

mag.acu.weights

Case weights for cucumber magnolia

1316.0

n/a

n/a

ost.vir

Ironwood presence (1) and absence (0)

44.0

24.0

18.0

ost.vir.weights

Case weights for ironwood

88.0

n/a

n/a

percentclay

Soil percent clay

101788.0

13476.9

11176.0

percentsand

Soil percent sand

191747.1

23550.5

21295.4

ph

Soil pH

34237.8

4308.8

3807.9

pin.str

White pine presence (1) and absence (0)

561.0

39.0

98.0

pin.str.weights

Case weights for white pine

1122.0

n/a

n/a

pla.occ

Sycamore presence (1) and absence (0)

55.0

4.0

12.0

pla.occ.weights

Case weights for sycamore

110.0

n/a

n/a

precipgs

Total precipitation, May through September (mm)

3208498.2

410422.7

360471.7

pru.ser

Black cherry presence (1) and absence (0)

160.0

13.0

86.0

pru.ser.weights

Case weights for black cherry

320.0

n/a

n/a

que.alb

White oak presence (1) and absence (0)

394.0

27.0

43.0

que.alb.weights

Case weights for white oak

788.0

n/a

n/a

que.vel

Black oak presence (1) and absence (0)

498.0

12.0

53.0

que.vel.weights

Case weights for black oak

996.0

n/a

n/a

slopedegrees

Slope angle (degrees)

15230.7

2122.2

1733.1

solradgs

Total solar radiation, May through September (Wh m-2)

4554780790.2

581063222.8

507667623.9

temp01avg

Mean January temperature (deg C)

-28591.1

-3720.9

-3273.4

tempavggs

Mean temperature, May through September (deg C)

99889.3

12639.9

11019.6

til.ame

Basswood presence (1) and absence (0)

3004.0

53.0

243.0

til.ame.weights

Case weights for basswood

5658.0

n/a

n/a

trails

Accessibility to a trail (kcal)

1741472.7

234840.2

231893.1

tsu.can

Hemlock presence (1) and absence (0)

2176.0

136.0

326.0

tsu.can.weights

Case weights for hemlock

4352.0

n/a

n/a

ulm.ame

Elm presence (1) and absence (0)

908.0

32.0

143.0

ulm.ame.weights

Case weights for elm

1816.0

n/a

n/a

x_coord

x-coordinates of sample points (UTM Zone 17N)

3704886988.2

471939617.9

410963504.5

y_coord

y-coordinates of sample points (UTM Zone 17N)

27282741631.9

3473135853.0

3031852492.1

 

biomod2_predictors.zip – this zipped file contains the predictor variables in raster format (coordinate system: UTM Zone 17N) that were used to project SDMs into geographic space, in order to train SDMs and create prediction surfaces.