Alexander Y. Karatayev, Lyubov E. Burlakova, Sergey E. Mastitsky, and Dianna K. Padilla. 2015. Predicting the spread of aquatic invaders: insight from 200 years of invasion by zebra mussels. Ecological Applications 25:430440.


R code and the data set necessary to conduct the Random Forest analysis.
Ecological Archives A025-027-S1.


File list (downloads)


Alexander Y. Karatayev
Great Lakes Center, SUNY Buffalo State, Buffalo, NY, USA

Lyubov E. Burlakova
Great Lakes Center, SUNY Buffalo State, Buffalo, NY, USA
The Research Foundation of The State University of New York
SUNY Buffalo State, Office of Sponsored Programs, Buffalo, New York, USA

Sergey E. Mastitsky
RNT Consulting, Ontario, Canada

Dianna K. Padilla
Department of Ecology and Evolution, Stony Brook University,
Stony Brook, New York 11794-5245 USA

File list

dreissena_in_lakes_of_belarus.csv (MD5: 3dc2d2f89af3064223358983c785771d)

r_script_random_forest.R (MD5: af1295890d60bc832955e940889e4575)


This Supplementary material contains two files necessary to fully reproduce the results obtained using the Random Forest classifier. The first of these files, dreissena_in_lakes_of_belarus.csv, is a plain text table that has 553 records, each described with the following variables:

1. Lake_Code: numeric codes uniquely identifying each lake (for reference only, not used in analysis explicitely).

2. ZMpresence: indicator of whether a lake is infested with zebra mussel (0 – for non-infested, 1 – for infested).

3. LAREA: lake area

4. LVOL: lake volume

5. MAXD: maximal depth

6. AVED: average depth

7. SPECWATSHED: specific watershed (i.e., drainage area)

8. TRANSP: Secci depth

9. COLOR: water color

10. pH: water pH

11. HCO3: HCO3 content

12. SO4: SO4 content

13. Cl: CL content

14. Ca: Ca content

15. Mg: Mg content

16. TDS: total dissolved solids

17: Fe: Fe content

18. Si: Si content

19. NH4: NH4 content

20. NO2: NO2 content

21. PO4: PO4 content

22. PermOx: permanganate oxydizability

23. N: latitude (decimal degree)

24: E: longitude (decimal degree)

Missing values in the data set are denoted as NA.

The second file, r_script_random_forest.R, loads the data into R (assuming that the file dreissena_in_lakes_of_belarus.csv is stored in the current R working directory), fits the Random Forest model, and plots the results. The analysis relies on three add-on packages: caret, geosphere, randomForest, and ggplot2. All these packages are assumed to be already installed on the user's computer (if not, they can be freely downloaded from the Comprehensive R Archive Network,, or installed directly from within R using the following command: install.packages(c("caret", "geosphere", "randomForest", "ggplot2"))).