EcoSim 5.0 Help System; Species Diversity

Species Diversity

Topics

Introduction to Species Diversity
Data Format
Defaults
Options
Output
Caveats
Species Diversity Tutorial
Literature Cited

1. Introduction to Species Diversity

Species diversity remains a central object of study, both in basic and applied community ecology. There are two issues to address in the study of species diversity. The first is, how can we quantify the diversity of an assemblage? And second, how can we compare statistically the diversity of two different assemblages?

The data for a study of species diversity constitute a sample of individuals that are classified into different species (or other OTUs). Although this module will emphasize the study of ecological species diversity, the principles apply to any hierarchical classification of diversity. Thus, we could study generic diversity by analyzing a set of species that are classified into different genera. An extensive literature on taxonomic diversity indices such as the species/genus (S/G) ratio has developed (Järvinen 1982), but the principles are exactly the same as for the study of species diversity at the individual level.

Species Richness and Species Evenness

We can decompose Species diversity into two components: species richness, which is the number of species in the assemblage, and species evenness, which is the relative distribution of individuals among species. For example, imagine two communities, each with 10 species and 100 individuals. The first community has maximum evenness with each species being represented by 10 individuals. The second community has minimum evenness, with 91 individuals of 1 species and 1 individual each of the remaining 9 species. Most ecologists would say that the first community is more diverse because it is more even.

Assumptions of Diversity Indices

All species diversity indices carry with them the following implicit assumptions (Peet 1974):

1) All species are equally important. In other words, species diversity indices cannot be used to recognize different species combinations or the presence or absence of particular keystone species.

2) All individuals assigned to a species are equal. Species diversity indices do not distinguish between different age classes or different life history stages of a species.

3) Species diversity has been measured in the appropriate units. Typically diversity is measured by collections of discrete individuals, but data might also be collected in the form of biomass or percentage cover. Because the diversity module of EcoSim is a statistical sampling model, diversity must be recorded as counts of discrete individuals (or colonies).

Sampling Effects

If you had sampled an entire community exhaustively, it would be easy to determine its species richness and to describe its evenness. Unfortunately, we rarely are in the position to have exhaustively collected all the organisms in a given community– in part because it is so difficult to define the spatial and taxonomic boundaries of a community!

As more individuals are sampled in an assemblage, species richness rises until an asymptote is reached, signifying that the maximum number of species in the assemblage has been collected. Unfortunately, when we sample a community– even with multiple collections– we don´t know where, precisely, we sit on the is sampling curve. Consequently, it becomes very difficult to compare the species diversity of different communities.

Do differences in species richness or evenness represent biological differences between communities? Or are they just sampling differences that might disappear if we collected more thoroughly? Most species diversity indices are sensitive to the number of individuals collected, making it difficult to compare species diversity in collections of different size.

The Rarefaction Solution

Sanders (1968) was interested in comparing the species richness of different marine assemblages. He reasoned that the most appropriate comparisons would be those that controlled for differences in abundance. In other words, he "rarefied" his samples down to a common abundance level and then compared species richness.

For example, suppose Community A has 500 individuals and 40 species. Community B has 250 individuals and 30 species. Rarefaction would tell you the expected number of species to be found in a sample of 250 individuals drawn randomly from Community A. You could then see if this number was greater or less than the 30 species that were found in Community B.

Rarefaction uses probability theory to derive expressions for the expectation and variance of species richness for a sample of a given size (Hurlbert 1971, Heck et al. 1975). This module of EcoSim provides a computer-sampling algorithm of rarefaction, in which a specified number of individuals are randomly drawn from a community sample. The process is repeated many times to generate a mean and a variance of species diversity. EcoSim will calculate the mean species richness, but will also allow you to construct these sampling curves for different richness and evenness indices.

2. Data Format

The input for the species diversity module is a vector of abundances, represented by a single column of data. Each row in the vector is the abundance of a particular species. The entries must be non-negative integers. EcoSim allows you to specify which column in a matrix of data is to be analyzed, so that diversity in several assemblages can be easily compared in the same data set.

3. Defaults

The default is 1000 iterations. However, many calculations are required for each simulation, so you can increase the speed of any simulation by reducing the number of abundance levels or reducing the number of iterations to 100. Results are very similar for 100 and 1000 iterations. The default species diversity index is species richness and the default-sampling algorithm is independent sampling. The default data used are those in the second column of the matrix (the first column contains species labels).

4. Options

Indices

Over the years, ecologists have used a plethora of diversity indices (Washington 1984, Magurran 1988). Most of these are highly correlated with one another and have similar statistical properties. EcoSim gives you a choice among the four most popular and useful diversity indices:

1) Species richness Species number is the most natural indicator of species richness in an assemblage. EcoSim calculates species richness by tabulating the number of non-zero rows in the input vector and in the rarefied samples.

2) PIE Hurlbert´s (1971) index calculates the probability of an interspecific encounter (PIE). In other words, this index gives the probability that two randomly sampled individuals from the assemblage represent two different species. Let N equal the total number of species in the assemblage, and let p(i) represent the proportion of the entire sample represented by species i. PIE is calculated as:

Hurlbert's PIE

There are several advantages of using PIE as a simple index of evenness. First, the index is easily interpreted as a probability. Second, this index is one of the few that is unbiased by sample size, although the variance increases at small N. Third, PIE has an important analog in population genetics. It is equivalent to the calculation of heterozygosity (H), the probability that two alleles are not identical by descent.

3) Dominance Dominance is simply the fraction of the collection that is represented by the most common species. Dominance can be a useful index of resource monopolization by a superior competitor, particularly in communities that have been invaded by exotic species (e.g., Porter and Savignano 1990). Like species richness, dominance is sensitive to sample size. In the extreme case of a collection of only 1 individual, dominance would always equal 1.0.

4) Shannon Diversity Index The Shannon-Weiner diversity index is calculated as:

Shannon Index

Where p(i) is the proportion of the sample represented by species i, and ln is the natural logarithm. This index has had a long history in ecology, and became a "magic bullet" for ecologists in the 1960s and 1970s. The tenuous theoretical justification for H´ came from information theory (Margalef 1958), but the idea that H´ is a measure of entropy is no longer warranted (Hurlbert 1971, Goodman 1975). Nevertheless, the index has been widely used in pollution studies and continues to be introduced to students through college laboratory exercises.

The problem with H´ (and with most diversity indices) is that it confounds species richness and evenness in a single number that cannot be interpreted biologically or statistically. If two communities differ in H´, we can´t be sure whether this reflects differences in species richness, species evenness, or simply sampling differences.

EcoSim will at least address the problem of sampling differences. It will allow you to compare the Shannon index for two communities based on equal sample sizes. We include this index for historical purposes, though we don´t recommend it as a useful index of species diversity.

Sampling Algorithm

EcoSim gives you two options for how the samples are drawn from your collection.

1) Independent Sampling Suppose you specified random samples with abundance levels of 10, 20, and 40 individuals. With independent sampling, EcoSim would draw out 10 individuals, calculate species diversity, and then replace those 10 individuals before drawing a random sample of 20. These 20 would also be replaced before drawing a random sample of 40. In other words, with Independent Sampling, EcoSim ensures that the estimators of diversity for the different abundance levels are independent of one another. Nearly always, this is the sampling algorithm you will want to use.

2) Accumulation Curves Suppose you specified accumulation curves with abundance levels of 10, 20, and 40 individuals. EcoSim would first draw out 10 randomly chosen individuals and calculate species diversity. Next, it would draw an additional 10 individuals, add those to the first sample, and calculate species diversity for the 20 individuals. Finally, it would add 20 more randomly chosen individuals and calculate diversity for the combined set of 40 individuals.

This accumulation curve might be important if one of the samples you are comparing is nested within another, larger sample. These sort of data may be common in nested vegetation surveys. For most purposes, however, you will want to use the independent sampling algorithm.

Abundance Levels

In addition to providing EcoSim with the input column, you must specify which abundance levels are to be used in the sampling. You can either let EcoSim provide a set of default levels that are tailored to your data set, or you can ask EcoSim to take random samples for particular abundance values.

1) Default EcoSim will set the abundance levels necessary for you to construct a diversity curve. Suppose your input data has a total of N individuals and S species. EcoSim will then create S + 3 abundance levels, up to a maximum of 42 abundance levels. The smallest abundance level will be an abundance of 1 and the largest abundance level will be an abundance of N. The remaining S + 1 samples will be evenly spaced between these boundaries (Tipper 1979).

2) User-defined If you check this option, an edit window pops up that lets you enter specific abundance levels. Enter one abundance level on each line of the screen. Your abundance levels must be integer values >= 1 and <= N to be used as valid input. Once this option has been checked, an edit button appears in the preferences window so you can change the abundance levels that you have specified. You can set a maximum of 100 abundance levels using the user-defined option.

Input Column

You must specify which column of data you want to analyze in the species diversity module. One column of data is analyzed at a time. The default is column 2, the first column of data in the matrix (remember that the first column in your data set always contains the species labels).

5. Output

Input Column Tab

This tab shows you the input column that you selected in the preferences window.

Simulation Tab

This tab shows you the abundances in the most recently simulated community. Don´t be alarmed if these data appear identical to those in the simulation tab. If you selected the default abundance levels, EcoSim simulates the maximum abundance, which is the same as the observed. If you want to see what a smaller assemblage looks like, specify a lower abundance in the User-Defined option of Abundance Level and then return to this tab.

Diversity Curve Tab

This tab gives you the results of the simulation output. Each row represents a different abundance level. The first row gives the diversity (observed) of the original sample. EcoSim tells you the mean and median of the diversity index, its variance, and a low and high bound for a 95% confidence interval. This interval is calculated as:

mean ± 1.96[sqrt(variance)]

Subsequent rows give these results for the different abundance levels (default or user defined).

The summary tab is split into an upper and lower window. The upper window gives the simulation conditions, including the name of the input file, the diversity index, the randomization algorithm, number of iterations, abundance levels, and random number seed. The lower window gives a spreadsheet that contains all of the information in the Diversity Curve Tab. The output from either window can then be saved ("Save to File" or "Save Diversity Curve") or discarded ("Close"). There is also a small time clock in the lower right-hand corner so you can tell how long your simulation took.

6. Caveats

Hypothesis Testing

You may notice that this module does not directly test hypotheses about species diversity. Instead, it simply estimates species diversity at different diversity levels. However, the mean and variance of diversity from these simulations can be used to generate simple hypothesis tests. To decide whether diversity is equivalent in two communities, use EcoSim to rarefy the larger community down to the abundance level of the smaller. Then check to see if the observed diversity of the smaller community falls within the 95% confidence interval or not. Alternatively, you could calculate a standardized deviate as:

(observed diversity - mean diversity)/sqrt(variance)

This deviate can then be compared to a standard normal distribution to obtain a tail probability. In many cases, simply plotting the diversity curves (and their 95% confidence intervals) for several assemblages is sufficient to show whether they are significantly different from one another or not.

Assumptions of Rarefaction

The "rarefaction methodology" carries with it a number of specific assumptions (Tipper 1979, Gotelli and Graves 1996):

1) Sampling has been sufficient to guarantee an adequate characterization of the parent distribution.

2) The spatial distribution of individuals is random.

3) The samples to be compared are taxonomically "similar" and are drawn from the "same" community type.

4) Standardized sampling techniques are used for all collections.

5) Rarefaction can be used for interpolation to a smaller sample size, but not for extrapolation to a larger sample size (EcoSim won´t let you even try this!). Colwell and Coddington (1994) is the critical reference for extrapolation estimates of species richness.

Assumption #4 is most critical because there really is no way to meaningfully compare communities that have not been sampled with comparable methods.

Another caution is that the rarefaction methodology only is appropriate for questions of species richness (or evenness)– what is the expected number of species for a given number of individuals? However, many investigators are actually interested in species density– the expected number of species per unit area (or sampling effort). Species density depends on both species richness and density (number of organisms/ unit area). See James and Wamer (1982) for a thorough discussion of these issues.

7. Tutorial

Use the "Open" command in the "File" menu to load the file "Pitfall carabids". These data represent pitfall trap collections of carabid beetles reported by Niemelä et al. (1988). The traps were placed in young (< 20 years) and old (20-60 years) pine plantations in northern Europe. Each row of the data set represents a different beetle species. If you wish, use the mouse to drag on the width of the column labels so you can read the species names.

The first column shows the data for the old plantations and the second column shows the data for the young plantations. Each entry is the number of individuals collected of a particular species in the two communities. For this module, the data must be non-negative integers that represent counts of individuals. Percentages, biomass, or coverage data cannot be analyzed with the algorithms in this module.

In the young plantations, the pitfall traps yielded 243 individuals and 31 species. In the old plantations, the traps yielded only 63 individuals and 9 species. Is species richness (and other measures of diversity) really higher in the young plantations? It is difficult to say from these data.

Almost 4 times as many individuals were collected from the young plantations, so it isn´t surprising that more species were discovered. Moreover, all the species in the old plantation are a subset of the species in the young plantation. This suggests that if the old plantation were sampled more intensively, it might yield the same diversity patterns.

How can we use EcoSim to help us explore this problem?

Use the "Species Diversity" option under the "Analyze" menu to compare the two communities.

Set the random number seed to the value 10. Choose "Species richness" as the diversity index and choose "Young Plantation" as the column to be analyzed.

Choose "User-defined" for the abundance level. This will pop up an edit window, in which you should enter the single value 63. You are instructing EcoSim to randomly subsample exactly 63 individuals from the young plantation data set.

Now run the simulation, which should take only a second for 100 iterations of a single abundance level.

The Input Column tab shows the single column of original data for the young plantations. The Simulation tab shows the results of a single random draw of 63 individuals from the input column.

Although there are 48 individuals of Calathus micropterus in the young plantation data (first row), only 9 individuals are present in the random sample. Three individuals of Notiophilus biguttatus were found in the young plantation data (third row), but none of these individuals were chosen in this particular random sample, so that species is not present.

The Diversity Curve tab summarizes the simulation results. The columns give the abundance, average and median of species richness, the variance and a low and high boundary for a 95% confidence interval.

The first row (shaded in gray) gives these numbers for the entire young plantation data set. This data set had 243 individuals and 31 species. If all these individuals are randomly sampled, the mean and the median will always be the same, and the variance will always be zero.

The next row gives the results for the abundance level that you specified in the edit dialog box. For random samples of 63 individuals, there was an average of 19.71 species represented, with a median of 20 species and a variance of 3.82414.

The last two columns give us a confidence interval that will allow us to answer the question of which of the two assemblages is most diverse. The confidence interval is from 15.88 to 23.54 species. In other words, 95% of the time that a random sample of 63 individuals is drawn from the young plantation assemblage, we expect to find between approximately 16 and 24 species.

However, the 63 individuals collected from the old plantation represent only 9 species. We can conclude that species richness is substantially higher for the young plantation, even after adjusting for sampling differences.

The Summary tab is split into two screens. The upper screen (which can be edited or annotated) displays all the options that you chose for your simulation. The lower screen is a spreadsheet that gives the diversity statistics for each level of abundance.

You can save both screens separately to disk files by clicking the "Save summary to file" or "Save diversity curve" buttons.

Close the output window and rerun this simulation again. This time, however, use the default abundance levels, rather than the user-defined levels. This simulation will take about 5 minutes to run, so this is a good time to go get a cup of coffee.

When you look at the "Diversity Curve" tab, you will now see that there are 33 rows of abundance levels, evenly spaced between a minimum of 1 and a maximum of 243 individuals. Notice that at these extremes, the variance of the simulations is zero. Because only 1 individual is drawn, only 1 species will be represented.

Conversely, if all 243 individuals are drawn, exactly 31 species will always be represented. Between those extremes, the number of species will vary from one run to the next for a particular abundance level, and this variability is reflected in the variance and confidence intervals.

In the future, we hope to implement a graphics option to plot these data. For now, you can import these numbers into a software graphics program to produce your own diversity curves. See Figures 2.1 and 2.8 in Gotelli and Graves (1996) for plots of these data.

You can use EcoSim to compare diversity with other indices besides species richness. These indices are described elsewhere in this Help text, and they can be selected from the "Species diversity index" box when you are selecting simulation options.

8. Literature Cited

Colwell, R.K. and J.A. Coddington. 1994. Estimating terrestrial biodiversity through extrapolation. Philosophical Transactions of the Royal Society of London B 345: 101-118.

Goodman, D. 1975. The theory of diversity-stability relationships in ecology. The Quarterly Review of Biology 50: 237-266.

Gotelli, N.J. and G.R. Graves. 1996. Null models in ecology. Smithsonian Institution Press, Washington, DC.

Heck, K.L., Jr., G. van Belle and D. Simberloff. 1975. Explicit calculation of the rarefaction diversity measurement and the determination of sufficient sample size. Ecology 56: 1459-1461.

Hurlbert, S.H. 1971. The nonconcept of species diversity: a critique and alternative parameters. Ecology 52: 577-585.

James, F.C. and N.O. Wamer. 1982. Relationships between temperate forest bird communities and vegetation structure. Ecology 63: 159-171.

Järvinen, O. 1982. Species-to-genus ratios in biogeography: a historical note. Journal of Biogeography 9: 363-370.

Magurran, A.E. 1988. Ecological diversity and its measurement. Princeton University Press, Princeton.

Margalef, R. 1958. Information theory in ecology. Gen. Syste. 3: 36-71.

Niemelä, J., Y. Haila, E. Halme, T. Lahti, T. Pajunen and P. Punttila. 1988. The distribution of carabid beetles in fragments of old coniferous taiga and adjacent managed forest. Annales Zoologici Fennici 25: 107-119.

Peet, R.K. 1974. The measurement of species diversity. Annual Review of Ecology and Systematics 5: 285-307.

Porter, S.D. and D.A. Savignano. 1990. Invasion of polygyne fire ants decimates native ants and disrupts arthropod community. Ecology 71: 2095-2106.

Sanders, H. 1968. Marine benthic diversity: a comparative study. The American Naturalist 102: 243-282.

Tipper, J.C. 1979. Rarefaction and rarefiction- the use and abuse of a method in paleoecology. Paleobiology 5: 423-434.

Washington, H.G. 1984. Diversity, biotic and similarity indices. Water Research 18: 653-694.