Ecological Archives A025-014-A1

Anna Rigosi, Paul Hanson, David P. Hamilton, Matthew Hipsey, James A. Rusak, Julie Bois, Karin Sparber, Ingrid Chorus, Andrew J. Watkinson, Boqiang Qin, Bomchul Kim, and Justin D. Brookes. 2015. Determining the probability of cyanobacterial blooms: the application of Bayesian networks in multiple lake systems. Ecological Applications 25:186199. http://dx.doi.org/10.1890/13-1677.1

Appendix A. Selecting Bayesian network structure: number of states and threshold values.

It was deemed critical that the Bayesian network should be kept as simple as possible Increasing the number of parents for each node would indeed increase the complexity and size of the Conditional Probability Tables (CPTs) associated with that node and would also increase the number of cases needed to populate the CPTs. Three networks of different complexity were adopted in the study. Here the effects of changing the number of states and the thresholds was analysed.

Selection of number of states- The simplest network adopted consisted of 3 nodes: 2 parents and 1 child (see Fig. 3). The endpoint node thresholds allocating abundance into categories of hazard were established following Chorus and Bartram (1999) and three states were fixed. The number of states for TP and WT was varied from 2 to 5. Each network was evaluated using three different case files (80% of data that included 1614 cases) and test files (20% of the data) randomly selected from the complete database as described in the methods. For each test the following outputs were recorded: the error of the three-nodes network, the percentage of 'good' predictions, the sensitivity of cyanobacteria to WT and TP, and the probabilities of low, moderate and high cyanobacterial hazard [P(L), P(M), P(H)] (Table A1). Using either case file 1, 2, or 3, the results showed that the networks with 3 or 4 states were more powerful than the networks with 2 or 5 states (lower error rates and higher sensitivities). The performance of the network with 3 or 4 states was dependent on the case file used. Differences in performance (percentage of good predictions) were calculated between experiments that adopted different states and also different case files. Changing from 3 to 4 states improved the prediction by about 0.6%, while adopting different case files (randomly selected from all the cases included in the database) changed the prediction by about 3%. Therefore, the 3-state model was adopted for the network because the model performance changed more using different case files to develop it, than shifting from 3 to 4 states.

Selection of thresholds - A specific study to find the best combination of thresholds was conducted for the most simplified network with three nodes (Fig. 3a). Networks with three states were tested using different thresholds, following the same testing procedure adopted to establish the number of states. First TP concentration thresholds were fixed (0 to 0.035; 0.035 to 0.1; 0.1 to infinity mg/L) and WT thresholds were varied using different combination of: 15, 18, 20, or 22 °C as the first threshold and 24, 26, 28, 30°C as the second threshold (total of 16 combinations). Each test was repeated with the three case files. The combination of thresholds with the minimum error rate was selected. Once the WT threshold was fixed, the TP thresholds were modified using a combination of 0.01, 0.02, 0.035 mg/L as the first threshold and 0.045, 0.05, 0.07, 0.1, 0.15 mg/L as the second threshold (total of 15 combinations). Error rates ranged from a maximum of 43.2% to a minimum of 32.6%. Combination with the minimum error rate had, respectively, 20 and 24 °C as first and second thresholds for WT and 0.02 and 0.1 mg/L as first and second thresholds for TP (as in Fig. 3a). The threshold for the rest of the possible nodes was established as specified in Table A2, following literature references or/and analysis of histograms representing data distributions for each variable. Further analyses would likely be needed to establish the influence of thresholds and number of states for the network with more than four nodes.

Table A1. Results of the experiments for the selection of the number of states to use in the Bayesian network nodes. Different case files created with combination of cases randomly selected from the complete database were used. Results expressed as percentage of good predictions (GP) show that the network is always more powerful using 3 or 4 states instead of 2 or 5 states.

 

Case File 1

Case File 2

Case File 3

 

2 states

3 states

4 states

5 states

2 states

3 states

4 states

5 states

2 states

3 states

4 states

5 states

Error rate %

37.6

32.6

32

37

40

35.7

35.4

36.6

39.7

37.3

36.7

37.6

Good predictions (GP) %

62.4

67.4

68

63

60

64.3

64.6

63.4

60.3

62.7

63.3

62.4

Sensitivity to WT %

13.1

17.5

19.9

17.5

13.3

16.8

18.7

16.8

12.9

17.5

19.1

17.4

Sensitivity to TP %

1.08

0.71

0.89

0.76

0.58

0.66

0.94

0.87

1.13

1.2

1.29

1.37

CyanoHazard P(L) %

60.6

59.2

60

58

60.3

59.6

60.4

58.6

60.6

59.4

60.2

58.5

CyanoHazard P(M) %

24.4

24.9

24.2

25.5

23.4

24

23.4

24.5

23.3

23.9

23.4

24.5

CyanoHazard P(H) %

15.6

16

15.8

16.6

16.4

16.4

16.2

16.9

15.1

16.6

16.3

17

Δ (GPstate(N)- GPstate(N-1))

 

5

0.6

-5

 

4.3

0.3

-1.2

 

2.4

0.6

-0.9

Δ ( GPCaseFile1-GPCaseFileN)

 

 

 

 

2.4

3.1

3.4

-0.4

2.1

4.7

4.7

0.6

 

Table A2. Names and definitions of the nodes incorporated in the structure of the Bayesian Network. The values used for all environmental variables were averaged one week before the cyanobacteria sampling date. TP data were taken at the same date of cyanobacteria or, depending on their availability, during the week before the cyanobacteria sampling date.

Node

Definition

Units

Thresholds

References

Cyano Hazard

Number of cyanobacteria cells

Cells/mL

High >100000

20000≤Moderate<100000

 Low ≥20000

(Chorus and Bartram 1999)

WT

Surface temperature

 

°C

WT <20

20≤WT<24

WT≥24

(Reynolds 2006)

(Chorus and Bartram 1999)

Database histograms*

Thresholds experiments**

TP

Total phosphorus at surface

mg/L

Threshold:

0<TP<0.02

0.02≤TP<0.1

TP≥0.1

(Reynolds 2006)

Database histograms*

Thresholds experiments**

WS

Wind speed over water surface

m/s

<4

≥4

(Webster and Hutchinson 1994)

Database histograms*

AirT

Air temperature

°C

-30<AirT<20

≥20

Database histograms*

PAR

Photosynthetically active radiation

 (c.45% of short wave radiation)

W/m²

 

<100

≥100

(Lee and Rhee 1999)

(Krause-Jensen and Sand-Jensen 1998)

Database histograms*

Zmix

Mixing depth

dT/dz < 0.2 0C/m

m

0<zmix <2

2≤zmix<5

zmix≥5

(Read et al. 2011)

Database histograms*

Zeu

Euphotic depth

 

m

0< zeu<5

5≤zeu<10

zeu≥10

(Martin and McCutcheon 1999)

Database histograms*

ZmixZeu

Ratio between mixing depth and euphotic depth

-

0<zmixzeu<0.5

0.5≤zmixzeu<1

zmixzeu>1

 

(Reynolds 2006)

(Oliver et al. 2010)

(Humphries and Lyne 1988)

(Oliver and Ganf 2000)

Database histograms*

Depth

Maximum depth of the reservoir

m

<5 Shallow

≥5 Deep

 

(Jeppensen et al. 2005)

Database histograms*

Latitude

Latitude of the lakes

°

<35 inter-tropical

≥35 subtropical

Database histograms*

*Thresholds were selected analysing histograms plotted for each variable;

** Experiments were conducted for variables directly connected to the endpoint node, see text.

Literature Cited

Chorus, I., and J. Bartram. 1999. Toxic Cyanobacteria in Water, A guide to their public health consequences, monitoring and management. World Health Organization, London, UK.

Humphries, E., and V. D. Lyne. 1988. Cyanophyte Blooms: the Role of Cell Bouyancy. Limnology and Oceanography 33:79–91.

Jeppensen, E., M. Søndegaard, J. P. Jensen, K. E. Havens, O. Anneville, L. Carvalho, M. F. Coveney, R. Deneke, M. T. Dokulil, B. Foy, D. Gerdeaux, S. E. Hampton, S. Hilt, K. Kangur, J. Köhler, E. Lammens, T. Lauridsen, M. Manca, M. Miracle, B. Moss, T. Noges, G. Persson, G. Phillips, R. Portielje, S. Romo, C. Schelske, D. Straile, I. Tatrai, E. Willen, and M. Winder. 2005. Lake responses to reduced nutrient loading - an analysis of contemporary long-term data from 35 case studies. Freshwater Biology 50:1747–1771.

Krause-Jensen, D., and K. Sand-Jensen. 1998. Light attenuation and photosynthesis of aquatic plant communities. Limnology and Oceanography 43:396–407.

Lee, D. Y., and G. Y. Rhee. 1999. Kinetics of growth and death in Anabaena flos-aquae (cyanobacteria) under light limitation and supersaturation. Journal of Phycology 35:700–709.

Martin, J. L., and S. C. McCutcheon. 1999. Hydrodynamics and transport for water quality modelling. Lewis Publishers, Washington DC.

Oliver, R. L., and G. G. Ganf. 2000. Freshwater blooms. Pages 149–186 in B. Whitton and P. Malcolm, editors. The ecology of cyanobacteria. Kluwer Academic Publishers, The Netherlands.

Oliver, R. L., S. M. Mitrovic, and C. Rees. 2010. Influence of salinity on light conditions and phytoplankton growth in a turbid river. River Research and Applications 26:894–903.

Read, J. S., D. P. Hamilton, I. D. Jones, K. Muraoka, L. A. Winslow, R. Kroiss, C. H. Wu, and E. Gaiser. 2011. Derivation of lake mixing and stratification indices from high-resolution lake buoy data. Environmental modelling and software 26:1325–1336.

Reynolds, C. S. 2006. The ecology of phytoplankton. Cambridge University Press, Cambridge, UK.

Webster, I. T., and P. A. Hutchinson. 1994. Effect of wind on the distribution of phytoplankton cells in lakes revisited. Limnology and Oceanography 39:365–373.


[Back to A025-014]