Supplement 1: DISTPCOA: Software for Principal Coordinate Analysis with corrections for negative eigenvalues.
Ecological Archives M069-001-S1.Authors
File list
Description
Download files
| Pierre
Legendre Département de sciences biologiques Université de Montréal C.P. 6128, succursale Centre-ville Montréal, Québec H3C 3J7, Canada Legendre@ere.umontreal.ca |
Marti J. Anderson
Centre for Research on Ecological Impacts of Coastal Cities and School of Biological Sciences Marine Ecology Laboratories, A11 University of Sydney Sydney, NSW 2006, Australia MJAnders@bio.usyd.edu.au |
File list
1. distpcoa.exe:
Executable file (PC-Dos/Windows)
2. distpcoa.for: Fortran source code for distpcoa.exe
3. test7x3.txt: example
input file
4. distpcoa.hqx: Self-extracting bin-hex archive containing
all files in Apple-Macintosh format
What does program DISTPCOA do?
This program performs Principal Coordinate Analysis (PCoA; Gower 1966) with the option of correcting for negative eigenvalues. This procedure is used as part of the distance-based redundancy analysis method (db-RDA) proposed by Legendre & Anderson (1998a). It may also be used in any other case where one wishes to obtain a full Euclidean representation of a distance matrix. If negative eigenvalues are produced, the correction methods available in this program allow one to obtain a full Euclidean representation in all cases.
The program can either read in a pre-computed distance matrix, or calculate a distance matrix from a raw data table. Five distance functions are available within the program: Bray-Curtis, square root of Bray-Curtis, chi-square, Hellinger, and Euclidean. Descriptions of these distances can be found in Legendre & Legendre (1998), among other texts.
The program uses a Householder procedure for finding the eigenvalues and eigenvectors of a square distance matrix. The subroutines (TRED2, TQLI) are from Chapter 11 of Numerical Recipes (Press et al., 1986).
Negative eigenvalues may be generated during the principal coordinate analysis of semimetric or nonmetric distance measures. For descriptions and comparisons of properties of various distance measures, see Gower & Legendre (1986) and Legendre & Legendre (1998). For example, the Bray-Curtis distance, which is widely used in ecology with species abundance data and is offered by the program, is a semimetric. Negative eigenvalues may also be produced during the analysis of some metric distances which do not guarantee a full Euclidean representation, as shown by Gower & Legendre (1986); see also Legendre & Legendre (1998, Table 7.2). The problem of negative eigenvalues is that the corresponding ordination axes are imaginary, their lengths being the square roots of their eigenvalues. Corrections for negative eigenvalues may be obtained using two methods:
Metric measures, such as square-root-transformed Bray-Curtis, chi-square, Hellinger or Euclidean distances, will give all positive eigenvalues in the PCoA analysis, so no correction is needed.
Methods 1 and 2 are described in Gower & Legendre (1986, theorem 7), in Legendre & Anderson (1998a), and in Legendre & Legendre (1998). The fact that square-root-transformed Bray-Curtis distances give all positive eigenvalues in PCoA is substantiated in Legendre & Anderson (1998a).
For use with db-RDA, Legendre and Anderson (1998a) have shown that correction method 1 does not affect the test of the analysis-of-variance statistic by permutation. Thus they recommend the use of correction method 1 in that context.
Input files
The input data file is an ASCII text file.
Options of the program
The following choices are offered by the program:
If a metric distance measure has been chosen, then no correction is necessary and this option will be obtained by the program regardless of the choice made here. Choosing "No correction" will only change the analysis explicitly if there are negative eigenvalues produced (for example, with Bray-Curtis distances). In that case, the eigenvectors corresponding to the negative eigenvalues will be ignored and only the coordinates corresponding to the positive eigenvalues will be output.
Output files
The run dialogue as well as the uncorrected and corrected eigenvalues are given in the dialogue window. The eigenvectors are given in a separate output file called PCOORD.TXT. If a correction for negative eigenvalues has been done, the coordinates in the output file are those of the corrected eigenvalues. The rows of this file correspond to the objects and the columns are the coordinates (i.e. variables) in the new system of axes. This file can be used directly as input to other programs of data analysis. For users of the db-RDA procedure, in particular, this file may become the "Species" matrix of a redundancy analysis using the CANOCO program.
Disclaimer
This program is provided without any explicit or implicit warranty of correct functioning. It has been developed as part of a university-based research program. If, however, you should encounter problems with this program, the authors will be happy to help solve them. Researchers may use this program for scientific purposes, but the source code remains the property of Pierre Legendre and Marti J. Anderson. Publications should give proper credit to the method by referring to the Legendre & Anderson (1998a) paper. Users of the program may refer to the present user's manual as follows:
Legendre, P. & M. J. Anderson. 1998b. Program DISTPCOA. Département de sciences biologiques, Université de Montréal. 10 pages.
Technical notes
The program is distributed in a variety of forms:
References
Gower, J. C. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325-338.
Gower, J. C. & P. Legendre. 1986. Metric and Euclidean properties of dissimilarity coefficients. Journal of Classification 3:5-48.
Legendre, P. & M. J. Anderson. 1998a. Distance-based redundancy analysis: testing multi-species responses in multi-factorial ecological experiments. Ecological Monographs (accepted).
Legendre, P. & Legendre, L. 1998. Numerical ecology, 2nd English edition. Elsevier Science BV, Amsterdam. xv + 853 pages.
Press, W. H., B. P. Flanery, S. A. Teukolsky & W. T. Vetterling. 1986. Numerical recipes - The art of scientific computing. Cambridge Univ. Press, Cambridge. xx + 818 p.
Appendix: Test runs
Consider the following input data matrix, called "test, 7x3". It has 7 rows (sites) and 3 columns (species):
3 4 5 3 2 5 3 6 4 7 5 7 6 8 9 3 6 3 4 5 7
The output in the dialogue window is the following, using the Bray-Curtis distance and correction method 1.
Principal coordinate analysis
with correction for negative eigenvalues, if any.
Maximum size of matrix: 400 objects and descriptors
Do you have a file with (1) a square Distance or Similarity matrix, or
(2) raw data ?
(Type -1 or -2 to get intermediate matrices printed.)
2
Name of input file with raw data?
(in which columns are variables and rows are replicates)
Input file name (raw data): test,7x3
How many objects?
7
How many variables?
3
Transform the raw data before computing distances?
(0) No transformation
(1) y' = sqrt(y), i.e. y' = y^0.5
(2) y' = double sqrt(y), i.e. y' = y^0.25
(3) y' = ln(y)
(4) y' = ln(y + 1)
(5) y' = log10(y)
(6) y' = log10(y + 1)
0
Options: (1) Bray-Curtis distance
(2) sqrt(Bray-Curtis)
(3) Chi-square distance
(4) Hellinger distance
(5) Euclidean distance
1
Correction for negative eigenvalues, if any:
1) Method 1 (Lingoes): d'(i,j) = sqrt(d(i,j)**2 + 2*c1)
2) Method 2 (Cailliez): d'(i,j) = d(i,j) + c2
3) No correction: yields coordinates corresponding
to positive eigenvalues only
1
18:02:17
*** Results of PCoA on the original distance matrix ***
Trace of Gower-centred matrix = 0.15814
PCoA eigenvalues
0.10936 0.04657 0.00673 0.00017 0.00000 -0.00152 -0.00318
The largest negative eigenvalue is -0.0031792355
Sum of computed eigenvalues = 0.15814
*** Results of PcoA on corrected distance matrix ***
Trace of Gower-centred matrix = 0.17721
PCoA eigenvalues
0.11254 0.04975 0.00991 0.00335 0.00166 0.00000 0.00000
Sum of computed eigenvalues = 0.17721
The number of non-zero eigenvalues is: 5
Non-zero Principal coordinates
have been written to output file: "Pcoord.txt"
18:02:18
Real time spent: 0.13 seconds
End of program.
File PCOORD.TXT contains the new coordinates of the 7 sites in 5 dimensions:
-0.09732 0.03677 0.01757 -0.00996 -0.02045
-0.16516 0.11596 -0.03876 0.00110 0.00089
-0.06308 -0.08861 -0.02175 -0.01839 0.02695
0.13589 0.06345 0.05297 -0.02800 0.00535
0.21189 -0.02103 -0.05983 0.00077 -0.01204
-0.07513 -0.14534 0.02514 0.00929 -0.01342
0.05291 0.03880 0.02464 0.04518 0.01272
For Bray-Curtis distance and correction method 2, the output in the dialogue window is the following.
Principal coordinate analysis
with correction for negative eigenvalues, if any.
Maximum size of matrix: 400 objects and descriptors
Do you have a file with (1) a square Distance or Similarity matrix, or
(2) raw data ?
(Type -1 or -2 to get intermediate matrices printed.)
2
Name of input file with raw data?
(in which columns are variables and rows are replicates)
Input file name (raw data): test,7x3
How many objects?
7
How many variables?
3
Transform the raw data before computing distances?
(0) No transformation
(1) y' = sqrt(y), i.e. y' = y^0.5
(2) y' = double sqrt(y), i.e. y' = y^0.25
(3) y' = ln(y)
(4) y' = ln(y + 1)
(5) y' = log10(y)
(6) y' = log10(y + 1)
0
Options: (1) Bray-Curtis distance
(2) sqrt(Bray-Curtis)
(3) Chi-square distance
(4) Hellinger distance
(5) Euclidean distance
1
Correction for negative eigenvalues, if any:
1) Method 1 (Lingoes): d'(i,j) = sqrt(d(i,j)**2 + 2*c1)
2) Method 2 (Cailliez): d'(i,j) = d(i,j) + c2
3) No correction: yields coordinates corresponding
to positive eigenvalues only
2
18:10:21
*** Results of PCoA on the original distance matrix ***
Trace of Gower-centred matrix = 0.15814
PCoA eigenvalues
0.10936 0.04657 0.00673 0.00017 0.00000 -0.00152 -0.00318
Sum of computed eigenvalues = 0.15814
*** Create Special matrix and find its largest eigenvalue ***
The largest eigenvalue of the Special matrix is 0.0380438751
*** Results of PcoA on corrected distance matrix ***
Trace of Gower-centred matrix = 0.21088
PCoA eigenvalues
0.13191 0.06090 0.01325 0.00351 0.00131 0.00000 0.00000
Sum of computed eigenvalues = 0.21088
The number of non-zero eigenvalues is: 5
Non-zero Principal coordinates
have been written to output file: "Pcoord.txt"
18:10:21
Real time spent: 0.15 seconds
End of program.
File PCOORD.TXT contains the new coordinates of the 7 sites in 5 dimensions:
-0.10669 0.04391 -0.01393 0.01163 -0.02278
-0.17486 0.13057 0.04492 -0.00032 0.00661
-0.07177 -0.10046 0.01498 0.00893 0.02273
0.14993 0.06591 -0.05857 0.03115 0.00733
0.22728 -0.02391 0.07344 -0.00071 -0.00801
-0.08399 -0.15847 -0.02214 -0.00253 -0.00993
0.06009 0.04243 -0.03869 -0.04815 0.00405
For Bray-Curtis distance without any correction for negative eigenvalues, the output in the dialogue window is the following.
Principal coordinate analysis
with correction for negative eigenvalues, if any.
Maximum size of matrix: 400 objects and descriptors
Do you have a file with (1) a square Distance or Similarity matrix, or
(2) raw data ?
(Type -1 or -2 to get intermediate matrices printed.)
2
Name of input file with raw data?
(in which columns are variables and rows are replicates)
Input file name (raw data): test,7x3
How many objects?
7
How many variables?
3
Transform the raw data before computing distances?
(0) No transformation
(1) y' = sqrt(y), i.e. y' = y^0.5
(2) y' = double sqrt(y), i.e. y' = y^0.25
(3) y' = ln(y)
(4) y' = ln(y + 1)
(5) y' = log10(y)
(6) y' = log10(y + 1)
0
Options: (1) Bray-Curtis distance
(2) sqrt(Bray-Curtis)
(3) Chi-square distance
(4) Hellinger distance
(5) Euclidean distance
1
Correction for negative eigenvalues, if any:
1) Method 1 (Lingoes): d'(i,j) = sqrt(d(i,j)**2 + 2*c1)
2) Method 2 (Cailliez): d'(i,j) = d(i,j) + c2
3) No correction: yields coordinates corresponding
to positive eigenvalues only
3
18:13:09
*** Results of PCoA on the original distance matrix ***
Trace of Gower-centred matrix = 0.15814
PCoA eigenvalues
0.10936 0.04657 0.00673 0.00017 0.00000 -0.00152 -0.00318
The negative eigenvalues, if any,
are being ignored in this analysis.
Sum of computed eigenvalues = 0.15814
The number of positive eigenvalues is: 4
Principal coordinates corresponding
to positive eigenvalues only
have been written to output file: "Pcoord.txt"
18:13:09
Real time spent: 0.08 seconds
End of program.
File PCOORD.TXT contains the new coordinates of the 7 sites in the 4 dimensions corresponding to the positive eigenvalues:
-0.09594 0.03558 -0.01448 0.00225
-0.16281 0.11219 0.03194 -0.00025
-0.06219 -0.08574 0.01792 0.00416
0.13396 0.06139 -0.04366 0.00633
0.20888 -0.02034 0.04930 -0.00017
-0.07406 -0.14062 -0.02072 -0.00210
0.05216 0.03754 -0.02031 -0.01022
ESA Publications | Ecological Archives | Permissions | Citation | Contacts