Annotated codes for performing spatial regression and autocorrelation analyses in S-PLUS. This document provides S-PLUS (MathSoft 1999) codes to perform the analyses in Lichstein et al. (2002). Many of the codes require the S-PLUS Spatial Module, which may be activated with the pull-down menu File/Load Module. To run the codes, copy and paste them into a "Script" window in S-PLUS, then hit the "Run" button on the Toolbar or hit the "F10" key. For clarity, instructions are separated from the codes by a line with a number sign (#). I do not provide codes pertaining to variable selection in multiple regression models. In Lichstein et al. (0000), non-significant habitat (environmental) variables were eliminated sequentially by hand (corresponding to backward elimination), and non-significant trend surface terms were eliminated by stepwise selection. Stepwise selection may be performed in S-PLUS using the pull-down menu Statistics/Regression/Stepwise. You may need to modify the default value for the maximum allowed number of bytes for an S-PLUS 'object'. If you have a large sample size, the default object size will likely be too small for the spatial analyses, and you will get an error message such as this one: "Error: Cannot allocate 11082640 bytes: options("object.size") is 500000: see options help file." For the data of Lichstein et al. (0000), with n = 1177, an object size of 12 megabytes was sufficient. # options(object.size = 12000000) The codes assume the following: (1) The response and explanatory variables are measured at geographic locations defined by the variables 'x' and 'y', which are distances measured from some arbitrary origin. Latitude/longitude (decimal-degree) data must be projected to a plane. (2) The name of the response variable is 'response'. (3) There is one continuous environmental variable named 'env1'. (4) There is one categorical environmental variable with three levels, 'env2a', 'env2b', and 'env2c', coded as 0/1 dummy variables; 'env2c' is omitted from the models to avoid a linear dependency in the design matrix (Legendre and Legendre 1998:46). (5) The trend surface consists of the following terms: x, y, and x^2. To fit a more complex trend surface, simply add the appropriate terms to the model. The data set need only include x and y. We will create x^2. (6) The variables are contained in columns (one row for each observation, or location) in an S-PLUS .sdd file named 'mydata' (.sdd files can be created using the pull-down menu File/Import Data). Many of the computations in the codes require the sample size, n. The line of code below extracts n from 'mydata'. If any rows of 'mydata' are not used in the analysis (e.g., due to missing data for an explanatory variable), the extracted sample size will not be correct; in this case, you should enter a number (the correct sample size) on the right side of the expression below (e.g., for a sample size of 150, the code would be "n <- 150"). This line of code is repeated several times throughout the document to ensure that the value of n is current. In all cases, you should modify the code as explained above if the number of rows of 'mydata' is not the correct sample size. # n <- nrow(mydata) For both ordinary least squares (OLS) and conditional autoregressive (CAR) models, we will use the likelihood ratio (LR) test to determine the significance of the variables in the models. The LR test requires as input the log-likelihoods (LLs) of the appropriate full and reduced models. The LR test allows for consistency between the OLS and CAR models in testing the significance of explanatory variables, and allows us to test the significance of the spatial parameter, rho, in CAR models. We will also need log-likelihoods to compute R-squareds using the formula of Nagelkerke (1991). For the OLS models, R-squareds obtained by this method are identical to those calculated from sums of squares. I. Ordinary least squares (OLS) models I. A. OLS environment models Use the 'lm' (linear model) function to fit an OLS model containing all environmental variables (full model). Then save the residuals in a new column in 'mydata' named 'ols.env.resid'. # ols.env.full <- lm(response ~ env1 + env2a + env2b, data=mydata) mydata$ols.env.resid <- residuals(ols.env.full) Use the 'summary' function (first line of code) to see the model results. Naming the object (second line of code) will also return some output. P-values are given for t-tests for each individual coefficient; later, we will use the LR test to determine the overall significance of the categorical variable 'env2'. # summary(ols.env.full) ols.env.full Compute a reduced model lacking 'env1'. # ols.env.red1 <- lm(response ~ env2a + env2b, data=mydata) Compute a reduced model lacking 'env2'. # ols.env.red2 <- lm(response ~ env1, data=mydata) Use the 'logLik' function to compute log-likelihoods of the full and reduced models. # LL.ols.env.full <- logLik(ols.env.full) LL.ols.env.red1 <- logLik(ols.env.red1) LL.ols.env.red2 <- logLik(ols.env.red2) Compute the likelihood ratio (LR) statistic for the variables 'env1' and 'env2'. # LR.env1 <- -2*(LL.ols.env.red1 - LL.ols.env.full) LR.env2 <- -2*(LL.ols.env.red2 - LL.ols.env.full) Use the 'pchisq' function to compute P-values for 'env1' (1 degree of freedom) and 'env2' (2 degrees of freedom). 'pchisq' returns the cumulative probability of the Chi-squared distribution; one minus this probability yields the desired P-value. The P-value for 'env1' should be nearly identical to that of the t-test in the 'summary' output for 'ols.env.full'. # 1 - pchisq(LR.env1, 1) 1 - pchisq(LR.env2, 2) To compute the model R-squared using the formula of Nagelkerke (1991), we need the log-likelihood of the null model, which contains only an intercept. Use 'lm' to fit the null model, then use 'logLik' to compute its log-likelihood. # ols.null <- lm(response ~ 1, data=mydata) LL.ols.null <- logLik(ols.null) Compute the R-squared of the full model using the formula of Nagelkerke (1991). The returned R-squared value should be identical to that given in the 'summary' output for ols.env.full above. The correct sample should have previously been assigned to the variable n (see above). # rsq.ols.env <- 1 - exp((-2/n)*(LL.ols.env.full - LL.ols.null)) rsq.ols.env I. B. OLS trend surface models Before fitting the trend surface, you may want to standardize the x and y coordinates to zero mean and unit variance. This will reduce collinearity between first order and higher order terms (Legendre and Legendre 1998:527), and is especially wise if the coordinates are large numbers with little variation. For example, UTM coordinates have six or seven digits, and the first several digits may be the same for all locations in the study area. Matrices with such numbers are difficult to invert, and the model output may not be reliable. To standardize a variable to zero mean and unit variance, first subtract the mean from each value, then divide by the standard deviation (or square-root of the variance, as in the code below). The following will create standardized versions of x and y ('x.std' and 'y.std') as new columns in 'mydata'. mydata$x.std <- (mydata$x - mean(mydata$x))/sqrt(var(mydata$x)) mydata$y.std <- (mydata$y - mean(mydata$y))/sqrt(var(mydata$y)) Create the variable 'x.sq' (x-squared) from 'x.std'. If desired, other polynomial trend surface terms may be computed in a similar manner. Do not standardize 'x.sq', as this will alter its quadratic relationship with 'x.std'. # mydata$x.sq <- (mydata$x.std)^2 Now, use 'lm' to compute a trend surface with the standardized coordinates 'x.std' and 'y.std' and the quadratic term 'x.sq'. # ols.trend.full <- lm(response ~ x.std + y.std + x.sq, data=mydata) summary(ols.trend.full) Perform LR tests for the trend surface terms. First, fit the reduced models. # ols.trend.redx <- lm(response ~ y.std + x.sq, data=mydata) ols.trend.redy <- lm(response ~ x.std + x.sq, data=mydata) ols.trend.redxsq <- lm(response ~ x.std + y.std, data=mydata) Next, compute log-likelihoods of the full and reduced models. # LL.ols.trend.full <- logLik(ols.trend.full) LL.ols.trend.redx <- logLik(ols.trend.redx) LL.ols.trend.redy <- logLik(ols.trend.redy) LL.ols.trend.redxsq <- logLik(ols.trend.redxsq) Compute the likelihood ratio (LR) statistics. Then use the LRs to calculate P-values for the trend surface terms, each with 1 degree of freedom. The P-values should be nearly identical to those in the summary output above for 'ols.trend.full'. # LR.x <- -2*(LL.ols.trend.redx - LL.ols.trend.full) LR.y <- -2*(LL.ols.trend.redy - LL.ols.trend.full) LR.xsq <- -2*(LL.ols.trend.redxsq - LL.ols.trend.full) 1 - pchisq(LR.x, 1) 1 - pchisq(LR.y, 1) 1 - pchisq(LR.xsq, 1) Compute the model R-squared using the formula of Nagelkerke (1991). The returned R-squared value should be identical to that from the summary output above for 'ols.trend.full'. The sample size should have previously been assigned to the variable n (see above). # rsq.ols.trend <- 1 - exp((-2/n)*(LL.ols.trend.full - LL.ols.null)) rsq.ols.trend I. C. OLS trend/environment models Use 'lm' to compute a model with both environmental variables and trend surface terms. Save the residuals in a new column in 'mydata' named 'ols.trend.env.resid'. # ols.trend.env.full <- lm(response ~ env1 + env2a + env2b + x.std + y.std + x.sq, data=mydata) mydata$ols.trend.env.resid <- residuals(ols.trend.env.full) summary(ols.trend.env.full) Perform LR tests for the explanatory variables. Below, we compute LR tests for 'env2' and 'y.std'. The other variables can be tested in a similar manner. First, fit the reduced models. # ols.trend.env.red2 <- lm(response ~ env1 + x.std + y.std + x.sq, data=mydata) ols.trend.env.redy <- lm(response ~ env1 + env2a + env2b + x.std + x.sq, data=mydata) Next, compute log-likelihoods of the full and reduced models. # LL.ols.trend.env.full <- logLik(ols.trend.env.full) LL.ols.trend.env.red2 <- logLik(ols.trend.env.red2) LL.ols.trend.env.redy <- logLik(ols.trend.env.redy) Compute the likelihood ratio (LR) statistic for 'env2' and 'y.std'. Then use the LRs to calculate P-values for 'env2' (2 degrees of freedom) and 'y.std' (1 degree of freedom). # LR.trend.env.env2 <- -2*(LL.ols.trend.env.red2 - LL.ols.trend.env.full) LR.trend.env.y <- -2*(LL.ols.trend.env.redy - LL.ols.trend.env.full) 1 - pchisq(LR.trend.env.env2, 2) 1 - pchisq(LR.trend.env.y, 1) Compute the model R-squared using the formula of Nagelkerke (1991). The returned R-squared value should be identical to that from the summary output above for 'ols.trend.env.full'. # rsq.ols.trend.env <- 1 - exp((-2/n)*(LL.ols.trend.env.full - LL.ols.null)) rsq.ols.trend.env I. D. Partition the variation explained by the OLS models (Borcard et al. 1992). Partition the proportion of explained variation into four components: (1) environment (independent of trend), (2) trend (independent of environment), (3) trend/environment (variation in the response that can be explained by either the environmental or the trend surface variables), and (4) unexplained variation. There are several ways to achieve the identical partitioning. We will use the simplest method, which is to subtract the appropriate R-squareds. See Legendre and Legendre (1998:528) for details. (1) environment # rsq.ols.trend.env - rsq.ols.trend (2) trend # rsq.ols.trend.env - rsq.ols.env (3) trend/environment # rsq.ols.trend + rsq.ols.env - rsq.ols.trend.env (4) unexplained # 1 - rsq.ols.trend.env II. Correlograms and semivariograms: raw data and OLS residuals Activate the S-PLUS Spatial Module. # module(spatial) At this point, you should check if the OLS residuals are autocorrelated, which would suggest that the OLS assumption of independent errors is not valid. As a point of reference, we will also look at the spatial pattern in the raw (unmodeled) data. The built-in S-PLUS functions 'variogram,' 'correlogram,' or 'covariogram' are easy to implement and are useful to check for anisotropy (directionality) in the data (Rossi et al. 1992, Legendre and Legendre 1998:731). The correlogram, which is based on the sample correlation coefficient, is commonly used in geostatistics (Rossi et al. 1992). This function is related to, but is not identical to, the Moran's I correlogram. I use the term "correlogram" to refer to the geostatistical function, and the terms "Moran's I" and "Geary's c" to distinguish those special types of correlograms. To test for significant autocorrelation, we will generate Moran's I correlograms and then perform a permutation test of Moran's I for each lag distance class. The codes may easily be modified to use Geary's c in place of Moran's I. The choice of functions for checking anisotropy vs. significance tests is a practical matter: with S-PLUS, it is easier to check for anisotropy using the built-in semivariogram, covariogram, or correlogram functions, while significance tests are easier with Moran's I or Geary's c. In principle, any of these structure functions could be used to check for anisotropy or to perform significance tests. The significance tests for Moran's I and Geary's c assume stationarity (Legendre and Legendre 1998:718). Raw data often violate stationarity assumptions; in this case, a significant value of I or c may reflect broad scale trend rather than fine scale autocorrelation. The test procedures for raw data and residuals are distinct (see Appendix of Lichstein et al. 0000). For raw data, the appropriate null hypothesis is spatial independence (randomness). In contrast, OLS residuals are not independent, even if the true errors are; here, the appropriate null hypothesis is the spatial pattern expected in the residuals if the OLS model of interest truly had independent errors. Similarly, the appropriate null hypothesis to test for autocorrelation in CAR residuals is the spatial pattern in the residuals expected from a CAR model fit to spatially independent data (see section III.E). In this section (II) and the following (III), we will use the original coordinate values ('x' and 'y') rather than the standardized versions ('x.std' and 'y.std') we created in section I.B. This lets us interpret the structure functions and spatial autoregressive models on the distance scale at which the data were collected. It should be no problem to use the non-standardized coordinates here (even if they are large numbers, such as UTM coordinates), because the analyses use the distances between locations, not the absolute coordinate values as in trend surface analysis. II. A. Built-in S-PLUS functions (semivariograms, covariograms, and correlograms) The easiest way to check (visually) for spatial pattern in the residuals is to use one of the built-in S-PLUS functions, described in Chapter 4 of Kaluzny et al. (1998). There is no built-in significance test for autocorrelation for these functions in S-PLUS. The following code generates an omnidirectional empirical semivariogram for the residuals from the OLS environment model. Substitute the word "response" for "ols.env.resid" to generate a semivariogram of the raw data. Name the object in the second line of code to see the output. # ols.env.resid.vgm <- variogram(ols.env.resid ~ loc(x,y), data=mydata) ols.env.resid.vgm The 'lag', 'nlag', 'maxdist', and 'minpairs' options allow the user to control certain features of the semivariogram (see 'variogram' in S-PLUS HELP files for details). S-PLUS sets these options for you if you omit them from the code. If you wish to use these options, you will probably need to change the numbers in the code below to suit your data. # ols.env.resid.vgm <- variogram(ols.env.resid ~ loc(x,y), data=mydata, lag=10, nlag=20, maxdist=200, minpairs=100) Plot the semivariogram and give it a title. The 'ylim' option allows you to control the limits of the y-axis. You may need to change the numbers for 'ylim', or simply omit this part of the code if you wish, in which case the code would read "plot(ols.env.resid.vgm)". # plot(ols.env.resid.vgm, ylim=c(0,2)) title(main = "OLS environment residuals, semivariogram") Now check for anisotropy by generating and plotting directional semivariograms. The following code generates directional semivariograms where site-pairs are searched for along each of four azimuths (0, 45, 90, and 135 degrees from north). The angular tolerance is 22.5 degrees, giving a 45 degree (= 2*22.5) search along each azimuth; the search is bi-directional, e.g., the 90 degree azimuth also searches along the 270 degree azimuth. You may change any of the numbers in the code below if desired. You can also specify more, or fewer, azimuths. # ols.env.resid.vgm.dir <- variogram(ols.env.resid ~ loc(x,y), data = mydata, azimuth=c(0,45,90,135), tol.azimuth=22.5) plot(ols.env.resid.vgm.dir) As with the omnidirectional semivariogram, you can control certain features of the directional semivariograms using the options 'lag', 'nlag', 'maxdist', and 'minpairs'. To generate covariograms or correlograms instead of semivariograms, substitute the word 'covariogram' or 'correlogram' for 'variogram' in the codes above. You should also change the name of the object (the name to the left of "<-"), e.g., you could use 'ols.env.resid.cgm.dir' for an object containing directional correlograms. II. B. Testing for autocorrelation: Moran's I correlograms In S-PLUS, you must supply a spatial weights matrix (W) to calculate Moran's I or Geary's c. The weights (wij) in the matrix W determine which site-pairs (ij) will be used to calculate the numerator of Moran's I or Geary's c, and how strongly each site-pair will impact the numerator (Legendre and Legendre 1998:715). To generate a correlogram, you will need to create a different W for each of a series of lag distance classes. Site-pairs (i.e., location-pairs) separated by a distance not included in the distance interval for a given lag class will have wij = 0 in the W matrix for that lag. All site-pairs whose inter-site distances fall within the lag class are typically assigned wij = 1 for that lag, so all site-pairs in the lag class are weighted equally. Alternate weight definitions (e.g., wij = 1/distanceij) could be used (Legendre and Legendre 1998:715), but we will assume wij = 0 or wij = 1 in the correlograms below. Section III.A provides code for alternate weight definitions if they are desired. The codes are for Moran's I correlograms. They could easily be modified to produce Geary's c correlograms. II. B. 1. Creating weight matrices. See sections 5.1.3-5.1.4 in Kaluzny et al. (1998) for further explanation of the codes. First, make sure the current value of n is the correct sample size. If any rows of "mydata" are not used in the analysis (e.g., due to missing data), the extracted sample size will not be correct; in this case, you should replace the right side of the expression with a number (the correct sample size). # n <- nrow(mydata) Use the 'cbind' function to create a matrix named 'location' with two columns containing the x and y coordinates of the data points. Then use the 'quad.tree' function to create a sorted matrix named 'quad' from the 'location' data. # location <- cbind(mydata$x, mydata$y) quad <- quad.tree(location) Now use the 'find.neighbor' function to find all site-pairs separated by a distance (dij) less than 'max.dist'. In the codes below, we will assume lag distance intervals of 10 units (meters, kilometers, etc.). So the first lag class should contain all site-pairs with dij<=10, the second lag class all site-pairs with 1010,] W2 <- spatial.neighbor(row.id=nbr2[,1], col.id=nbr2[,2], nregion=n) Third lag: all site-pairs with 2020,] W3 <- spatial.neighbor(row.id=nbr3[,1], col.id=nbr3[,2], nregion=n) Fourth lag: all site-pairs with 3030,] W4 <- spatial.neighbor(row.id=nbr4[,1], col.id=nbr4[,2], nregion=n) Fifth lag: all site-pairs with 4040,] W5 <- spatial.neighbor(row.id=nbr5[,1], col.id=nbr5[,2], nregion=n) Create as many additional weight matrices (one for each lag distance class) as necessary using similar code. To see what the site-pairs look like in space, use the 'plot.spatial.neighbor' function. For example, the following line of code produces a map of all site-pairs in the third lag class. # plot.spatial.neighbor(W3, xcoord=mydata$x, ycoord=mydata$y) To see the information in the weight matrices, name the matrix. For example, naming 'W5' in the first line of code below will display all site-pairs in W5 (weight matrix for fifth lag class). Alternatively, if you have a lot of locations, you may wish to display only a few rows of each matrix at a time. For example, the second line of code will display the first ten rows of W5. # W5 W5[1:10, ] The returned output is not an n by n matrix, but S-PLUS uses the information as if it were an n by n matrix. The first two columns ('row.id' and 'col.id') give the labels for the site-pairs in the distance class. The third column ('weights') should all be ones, which are the default weights. As is typically done, we will use weights of one to compute Moran's I or Geary's c correlograms below. If desired, you could use the code in section III.A to re-define the weights as 1/dij or 1/dij^2. The fourth column ('matrix') should all be ones, indicating that all of the site-pairs are in the same matrix. See Chapter 5 of Kaluzny et al. (1998) for applications requiring multiple weight matrices. II. B. 2. a. Moran's I correlogram for raw data. A program is provided to generate a Moran's I correlogram for raw data. Copy and paste the entire program into a "Script" window in S-PLUS. The program uses the built-in S-PLUS function 'spatial.cor' to calculate Moran's I (raw data formula) and perform the raw data permutation test described in the Appendix of Lichstein et al. (0000). The P-value reported by the built-in permutation test in 'spatial.cor' does not include the reference (observed) value of I in the permutation distribution. The program below adds the reference value (Legendre and Legendre 1998:22) and re-calculates the P-value. The program assumes 20 lag distance classes, with weight matrices named W1, W2,..., W20 (see section II.B.1 above). If you have more than 20 lags you will need to add additional lines of code where indicated in the comments below. (Anything to the right of a number sign # is a comment.) The sample size (n) is extracted in the first line of the program from 'mydata'. As explained previously, you will need to change this line of code if the number of rows in 'mydata' is not the correct sample size. Both one-tailed (positive autocorrelation) and two-tailed tests for individual lags are performed with the progressive Bonferroni correction (Legendre and Legendre 1998:671, 721-3), and a global test for the overall significance of the correlogram is performed at the Bonferroni corrected alpha level of alpha divided by the number of lags in the correlogram (Legendre and Legendre 1998:721). The calculations for Imax (maximum attainable value of I; see Appendix of Lichstein et al. 0000) may be slow for large n. The current lag is printed as the program runs to inform you how much computing time remains. Comments in CAPs inform you where you may need to alter the program to suit your data. Be sure you read through all comments in CAPs before running the program. # n <- nrow(mydata) # REPLACE RIGHT SIDE OF EXPRESSION WITH YOUR SAMPLE # SIZE IF NUMBER OF ROWS IN MYDATA IS NOT CORRECT SAMPLE SIZE. alpha <- .05 # ALPHA LEVEL FOR SIGNIFICANCE TESTS. n.perm <- 999 # NUMBER OF PERMUTATIONS TO GENERATE NULL DISTRIBUTION. n.lags <- 20 # CHANGE THIS NUMBER IF YOU HAVE MORE OR LESS THAN 20 LAGS. variable <- mydata$response # VARIABLE OF INTEREST # Create structure for output data. # First, create a vector of distances (e.g., lag mid-points) for # x-axis of correlogram. # THE CODE ASSUMES 20 LAGS. CHANGE THE NUMBER OF LAGS AND # THE VALUES FOR THE LAG MID-POINTS TO SUIT YOUR DATA. dist <- c(5,15,25,35,45,55,65,75,85,95,105,115,125,135,145,155, 165,175,185,195) # Next, create a vector of length n.lags named "vec"; default # values are zeros. vec <- vector(mode="numeric", length=n.lags) dim(vec) <- c(n.lags,1) # Now, create a data.sheet that will contain the output. # Terms to the left of "=" are column labels. cgm.raw <- data.sheet(dist=dist,Iraw=vec,Imax=vec,Istd=vec, extlarge=vec,Plarge=vec,extsmall=vec,Psmaller=vec, Pcrit1=vec,Pcat1=vec,Pcrit2=vec,Pcat2=vec) lag <- 1 # the program cycles through one lag at a time while(lag <= n.lags) { print(lag) # The code assumes 20 lag classes. IF YOU HAVE >20 LAGS, YOU # WILL NEED TO ADD ADDITIONAL "IF" STATEMENTS. If you have # <20 lags, you need not remove the extra "if" statements. if (lag == 1) {invisible(W <- spatial.weights(W1)) invisible(W.sparse <- W1)} if (lag == 2) {invisible(W <- spatial.weights(W2)) invisible(W.sparse <- W2)} if (lag == 3) {invisible(W <- spatial.weights(W3)) invisible(W.sparse <- W3)} if (lag == 4) {invisible(W <- spatial.weights(W4)) invisible(W.sparse <- W4)} if (lag == 5) {invisible(W <- spatial.weights(W5)) invisible(W.sparse <- W5)} if (lag == 6) {invisible(W <- spatial.weights(W6)) invisible(W.sparse <- W6)} if (lag == 7) {invisible(W <- spatial.weights(W7)) invisible(W.sparse <- W7)} if (lag == 8) {invisible(W <- spatial.weights(W8)) invisible(W.sparse <- W8)} if (lag == 9) {invisible(W <- spatial.weights(W9)) invisible(W.sparse <- W9)} if (lag == 10) {invisible(W <- spatial.weights(W10)) invisible(W.sparse <- W10)} if (lag == 11) {invisible(W <- spatial.weights(W11)) invisible(W.sparse <- W11)} if (lag == 12) {invisible(W <- spatial.weights(W12)) invisible(W.sparse <- W12)} if (lag == 13) {invisible(W <- spatial.weights(W13)) invisible(W.sparse <- W13)} if (lag == 14) {invisible(W <- spatial.weights(W14)) invisible(W.sparse <- W14)} if (lag == 15) {invisible(W <- spatial.weights(W15)) invisible(W.sparse <- W15)} if (lag == 16) {invisible(W <- spatial.weights(W16)) invisible(W.sparse <- W16)} if (lag == 17) {invisible(W <- spatial.weights(W17)) invisible(W.sparse <- W17)} if (lag == 18) {invisible(W <- spatial.weights(W18)) invisible(W.sparse <- W18)} if (lag == 19) {invisible(W <- spatial.weights(W19)) invisible(W.sparse <- W19)} if (lag == 20) {invisible(W <- spatial.weights(W20)) invisible(W.sparse <- W20)} # Use the built-in S-PLUS function 'spatial.cor' to calculate Moran's # I for raw data and perform a permutation test. Name the object # 'sp.cor'. Then extract Iraw from 'sp.cor'. Substitute the word # 'geary' for 'moran' for a Geary's c correlogram. sp.cor <- spatial.cor(variable, neighbor=W.sparse, statistic = "moran", sampling="nonfree", npermutes=n.perm) Iraw <- sp.cor$correlation # Moran's I (raw data formula) # Need S to compute Imax (below). S = number of site pairs in W, # where both ij and ji are counted as pairs. rowsums <- apply(W,1,sum) S <- sum(rowsums) S # show value of S # Compute maximum attainable value for Moran's I. # See Haining (1990:234-5) or Appendix of Lichstein et al. (0000). numerator <- 0 var.cent <- variable - mean(variable) # mean-centered value j <- 1 while(j <= n) { numerator <- (W[j,]%*%var.cent)^2 + numerator j <- (j + 1) } denominator <- t(var.cent)%*%var.cent Imax <- (n/S)*sqrt(numerator/denominator) # Divide Iraw by Imax to standardize Iraw between -1 and +1. Istd <- Iraw/Imax # The significance tests require the number of extreme values (i.e., # values generated under the null hypothesis of no autocorrelation that # are greater or less than the observed value of I). The following # lines of code calculate the number of permutation values as large as # and as small as the reference value. The permutation P-value from # 'spatial.cor' is called "perm.p.value", and is the probability of # observing a larger value of I. Add one to extreme values to # include the reference value in the permutation distribution # (Legendre and Legendre 1998:22-5). extr.large <- n.perm*sp.cor$perm.p.value + 1 extr.small <- n.perm - n.perm*sp.cor$perm.p.value + 1 # Divide number of extreme values by number of permutations to get P- # values for permutation test (probabilities, under null hypothesis of # no autocorrelation, of observing a value of I as large or as small # as the reference value (observed value of I). # Add one to denominator to include the reference value. Plarge <- (extr.large)/(n.perm + 1) Psmall <- (extr.small)/(n.perm + 1) # Divide alpha by lag integer to set critical P-value for one-tailed # (positive autocorrelation) progressive Bonferroni test (Legendre and # Legendre 1998: pages 671, 721-3). Pcrit1 <- alpha/lag # Label one-tailed test as significant (1) or not significant (0). if (Plarge <= Pcrit1) Pcat1 <- 1 else Pcat1 <- 0 # Divide alpha/2 by lag integer to set critical P-value for two-tailed # progressive Bonferroni test (Legendre and Legendre 1998: # pages 671, 721-3). Pcrit2 <- (alpha/2)/lag # Label two-tailed test as significant (1) or not significant (0). if (Plarge <= Pcrit2 || Psmall <= Pcrit2) Pcat2 <- 1 else Pcat2 <- 0 # Assign values to output. cgm.raw[lag,2] <- Iraw # Moran's I for raw data cgm.raw[lag,3] <- Imax # max possible value of Ires cgm.raw[lag,4] <- Istd # Ires standardized to +/- 1 cgm.raw[lag,5] <- extr.large # number of extreme large values in perm # distribution (includes reference value) cgm.raw[lag,6] <- Plarge # probability of as large a value of I cgm.raw[lag,7] <- extr.small # number of extreme small values in perm # distribution (includes reference value) cgm.raw[lag,8] <- Psmall # probability of as small a value of I cgm.raw[lag,9] <- Pcrit1 # critical P for one-tailed test progressive # Bonferroni test cgm.raw[lag,10] <- Pcat1 # one-tailed test (1=signif, 0=not signif) cgm.raw[lag,11] <- Pcrit2 # critical P for two-tailed progressive # Bonferroni test cgm.raw[lag,12] <- Pcat2 # two-tailed test (1=signif, 0=not signif) lag <- (lag + 1) # go to beginning of loop and start next lag } # Global Bonferroni test (one-tailed): print('Lowest P-value out of all lags =') min(cgm.raw[,6]) print('One-tailed Bonferroni corrected critical P-value =') alpha/n.lags if (min(cgm.raw[,6]) < alpha/n.lags) {print('One-tailed global Bonferroni test is significant')} else {print('One-tailed global Bonferroni test not significant')} # Global Bonferroni test (two-tailed): two.tails <- c(cgm.raw[,6],cgm.raw[,8]) # combine right and # left tail values two.tails # print the values print('Lowest P-value out of all lags =') min(two.tails) print('Two-tailed Bonferroni corrected critical P-value =') (alpha/2)/n.lags if (min(two.tails) < (alpha/2)/n.lags) {print('Two-tailed global Bonferroni test is significant')} else {print('Two-tailed global Bonferroni test not significant')} cgm.raw # print output # Plot the correlograms (one- and two-tailed tests): par(mfrow=c(2,2)) # plots arrayed on screen in 2 rows, 2 columns symbol <- vector(mode="numeric", length=n.lags) # empty vector to dim(symbol) <- c(n.lags,1) # contain plotting symbols # Structure for one-tailed test correlogram. 'ylim' sets limits # for y-axis. plot(x=cgm.raw$dist, y=cgm.raw$Istd, type="n", xlab="lag distance", ylab="Istd", ylim=c(-.25,1)) title(main='raw data, one-tailed test') for (i in 1:n.lags) {if (cgm.raw[i,10] == 1) {symbol[i] <- 15} # set plotting symbols: else {symbol[i] <- 0} # signif = closed square; not signif = open # Plot values of Istd. 'mkh' is height of symbol in inches; #'col' is symbol color points(x=cgm.raw$dist[i], y=cgm.raw$Istd[i], pch=symbol[i], mkh=.1, col=1) next } # Structure for two-tailed test correlogram. 'ylim' sets limits # for y-axis. plot(x=cgm.raw$dist, y=cgm.raw$Istd, type="n", xlab="lag distance", ylab="Istd", ylim=c(-.25,1)) title(main='raw data, two-tailed test') for (i in 1:n.lags) {if (cgm.raw[i,12] == 1) {symbol[i] <- 15} # set plotting symbols: else {symbol[i] <- 0} # signif = closed square; not signif = open # Plot values of Istd. 'mkh' is height of symbol in inches; #'col' is symbol color points(x=cgm.raw$dist[i], y=cgm.raw$Istd[i], pch=symbol[i], mkh=.1, col=1) next } ################### end of program ################### II. B. 2. b. Moran's I correlogram for OLS residuals. A program is provided to generate a Moran's I correlogram for OLS residuals. Copy and paste the entire program into a "Script" window in S-PLUS. There is no built-in function in S-PLUS to calculate Ires (Moran's I for residuals). The program uses matrix algebra to compute Ires; alternatively, you could compute Ires by (1) using the function 'spatial.cor' to compute Iraw and (2) multiplying Iraw by S/n (see Appendix of Lichstein et al. 0000). The choice of Ires vs. Iraw is trivial after standardization to Istd, because Ires and Iraw yield the identical standardized value of I. The program for OLS residuals is considerably slower than the raw data program above. The current lag is printed as the program runs to inform you how much computing time remains. The program has the same structure as the raw data program; as before, you will need to modify the program (see comments in CAPs) to suit your data. Significance tests as explained above for raw data program. The null distribution for Ires is based on the pattern expected in the residuals if the OLS model of interest truly had independent errors. You must specify the model of interest in the program. The program below assumes that the model of interest is the OLS trend/environment model (section I.C). To test for autocorrelation in residuals from a different model, you will need to change the model in two places: (1) the first lines of the program, where the residuals from the model of interest are assigned to the variable 'res', and (2) about mid-way down the program, in the line beginning "res.norm <- resid...", where the random normal data are regressed on the model of interest to generate residuals for the permutation test. Comments in CAPs inform you where you may need to alter the program to suit your data. Be sure you read through all comments in CAPs before running the program. # res <- residuals(lm(response ~ env1 + env2a + env2b + x.std + y.std + x.sq, data=mydata)) # MORAN'S I IS CALCULATED FOR # RESIDUALS FROM THIS MODEL. MAKE SURE THIS IS THE MODEL OF INTEREST. n <- nrow(mydata) # REPLACE RIGHT SIDE OF EXPRESSION WITH YOUR SAMPLE # SIZE IF NUMBER OF ROWS IN MYDATA IS NOT CORRECT SAMPLE SIZE. alpha <- .05 # ALPHA LEVEL FOR SIGNIFICANCE TESTS. n.perm <- 999 # NUMBER OF PERMUTATIONS TO GENERATE NULL DISTRIBUTION. n.lags <- 20 # CHANGE THIS NUMBER IF YOU HAVE MORE OR LESS THAN 20 LAGS. # Create structure for output data. # First, create vector of distances (e.g., lag mid-points) for # x-axis of correlogram. # THE CODE ASSUMES 20 LAGS. CHANGE THE NUMBER OF LAGS AND THE # VALUES FOR THE LAG MID-POINTS TO SUIT YOUR DATA. dist <- c(5,15,25,35,45,55,65,75,85,95,105,115,125,135,145,155, 165,175,185,195) # Next, create vector of length n.lags named "vec"; default values # are zeros. vec <- vector(mode="numeric", length=n.lags) dim(vec) <- c(n.lags,1) # Now, create a data.sheet that will contain the output. Terms to the # left of "=" are column labels. cgm.ols <- data.sheet(dist=dist,Iraw=vec,Ires=vec,Imax=vec,Istd=vec, extlarge=vec,Plarge=vec,extsmall=vec,Psmall=vec, Pcrit1=vec,Pcat1=vec,Pcrit2=vec,Pcat2=vec) # Create structure for matrix of permutation values. nullI <- vector(mode="numeric", length=n.perm*n.lags) dim(nullI) <- c(n.perm,n.lags) lag <- 1 # the program cycles through one lag at a time while(lag <= n.lags) { print(lag) # The code assumes 20 lag classes. IF YOU HAVE >20 LAGS, YOU # WILL NEED TO ADD ADDITIONAL "IF" STATEMENTS. If you have # <20 lags, you need not remove the extra "if" statements. if (lag == 1) {invisible(W <- spatial.weights(W1))} if (lag == 2) {invisible(W <- spatial.weights(W2))} if (lag == 3) {invisible(W <- spatial.weights(W3))} if (lag == 4) {invisible(W <- spatial.weights(W4))} if (lag == 5) {invisible(W <- spatial.weights(W5))} if (lag == 6) {invisible(W <- spatial.weights(W6))} if (lag == 7) {invisible(W <- spatial.weights(W7))} if (lag == 8) {invisible(W <- spatial.weights(W8))} if (lag == 9) {invisible(W <- spatial.weights(W9))} if (lag == 10) {invisible(W <- spatial.weights(W10))} if (lag == 11) {invisible(W <- spatial.weights(W11))} if (lag == 12) {invisible(W <- spatial.weights(W12))} if (lag == 13) {invisible(W <- spatial.weights(W13))} if (lag == 14) {invisible(W <- spatial.weights(W14))} if (lag == 15) {invisible(W <- spatial.weights(W15))} if (lag == 16) {invisible(W <- spatial.weights(W16))} if (lag == 17) {invisible(W <- spatial.weights(W17))} if (lag == 18) {invisible(W <- spatial.weights(W18))} if (lag == 19) {invisible(W <- spatial.weights(W19))} if (lag == 20) {invisible(W <- spatial.weights(W20))} Ires <- (t(res)%*%W%*%res)/(t(res)%*%res) # Moran's I for residuals # Need S to compute Imax (below). S = number of site pairs # in W, where both ij and ji are counted as pairs. rowsums <- apply(W,1,sum) S <- sum(rowsums) S # show value of S # Max possible value for Moran's I (Haining 1990:234-5). See also # Appendix of Lichstein et al. (0000). numer <- 0 j <- 1 while(j <= n) { numer <- (W[j,]%*%res)^2 + numer j <- (j + 1) } denom <- t(res)%*%res Imax <- sqrt(numer/denom) # omit n/S from Haining's # formula because we have Ires, not Iraw. # Divide Ires by Imax to standardize Ires between -1 and +1. Istd <- Ires/Imax extr.large <- 0 # re-set number of extreme positive and extr.small <- 0 # negative values to zero. # Generate permutation distribution for Ires (see Appendix of # Lichstein et al. 0000): (1) Generate n random normal variates, # (2) regress the random data on the model of interest, (3) # calculate Ires from residuals of step 2, (4) repeat n.perm times. for (i in 1:n.perm) { # Generate n random normal variates and assign to variable "norm". mydata$norm <- rnorm(n, mean=0, sd=1) # NAME OF YOUR DATA- # SET MUST BE ON LEFT SIDE OF "$" # Regress random data on model of interest and save the residuals. res.norm <- resid(lm(norm ~ env1 + env2a + env2b + x.std + y.std + x.sq, data=mydata)) # MAKE SURE THIS IS THE # MODEL OF INTEREST. IT SHOULD BE THE SAME AS THE MODEL # IN THE FIRST LINE OF THE PROGRAM. # Calculate Ires from residuals of random data regression. Ires.norm <- (t(res.norm)%*%W%*%res.norm)/(t(res.norm)%*%res.norm) # Tabulate permutation values in matrix "nullI" nullI[i,lag] <- Ires.norm # If Ires.norm is >= to reference value, add one to # extr.large (number of perm values as large as reference value). if (Ires.norm >= Ires) extr.large <- (extr.large + 1) # If Ires.norm is <= to reference value, add one to # extr.small (number of perm values as small as reference value). if (Ires.norm <= Ires) extr.small <- (extr.small + 1) next # repeat for next permutation } # Add one to number of extreme values to include reference value # (Legendre and Legendre 1998:22-5). Then divide number of extreme # values by number of permutations (plus one for reference value) to # get P-values for permutation test. extr.large <- extr.large + 1 Plarge <- (extr.large)/(n.perm + 1) extr.small <- extr.small + 1 Psmall <- (extr.small)/(n.perm + 1) # Divide alpha by lag integer to set critical P-value for one-tailed # (positive autocorrelation) progressive Bonferroni test (Legendre and # Legendre 1998: pages 671, 721-3). Pcrit1 <- alpha/lag # Label one-tailed test as significant (1) or not significant (0). if (Plarge <= Pcrit1) Pcat1 <- 1 else Pcat1 <- 0 # Divide alpha/2 by lag integer to set critical P-value for two-tailed # progressive Bonferroni test. Pcrit2 <- (alpha/2)/lag # Label two-tailed test as significant (1) or not significant (0). if (Plarge <= Pcrit2 || Psmall <= Pcrit2) Pcat2 <- 1 else Pcat2 <- 0 # Assign values to output. cgm.ols[lag,2] <- (n/S)*Ires # Moran's I for raw data; should get identical # value from function 'spatial.cor' cgm.ols[lag,3] <- Ires # Moran's I for residuals cgm.ols[lag,4] <- Imax # max possible value of Ires cgm.ols[lag,5] <- Istd # Ires standardized to +/- 1; note that Istd # is same for Iraw or Ires formulas cgm.ols[lag,6] <- extr.large # number of extreme large values in perm # distribution (includes reference value) cgm.ols[lag,7] <- Plarge # probability of as large a value of I cgm.ols[lag,8] <- extr.small # number of extreme small values in perm # distribution (includes reference value) cgm.ols[lag,9] <- Psmall # probability of as small a value of I cgm.ols[lag,10] <- Pcrit1 # critical P for one-tailed test progressive # Bonferroni test cgm.ols[lag,11] <- Pcat1 # one-tailed test (1=signif, 0=not signif) cgm.ols[lag,12] <- Pcrit2 # critical P for two-tailed progressive # Bonferroni test cgm.ols[lag,13] <- Pcat2 # two-tailed test (1=signif, 0=not signif) lag <- (lag + 1) # go to beginning of loop and start next lag } # nullI # Remove "#" to left of "nullI" to print permutation values. # Global Bonferroni test (one-tailed): print('Lowest P-value out of all lags =') min(cgm.ols[,7]) print('One-tailed Bonferroni corrected critical P-value =') alpha/n.lags if (min(cgm.ols[,7]) < alpha/n.lags) {print('One-tailed global Bonferroni test is significant')} else {print('One-tailed global Bonferroni test not significant')} # Global Bonferroni test (two-tailed): two.tails <- c(cgm.ols[,7],cgm.ols[,9]) # combine the right and # left tail values two.tails # print the values print('Lowest P-value out of all lags =') min(two.tails) print('Two-tailed Bonferroni corrected critical P-value =') (alpha/2)/n.lags if (min(two.tails) < (alpha/2)/n.lags) {print('Two-tailed global Bonferroni test is significant')} else {print('Two-tailed global Bonferroni test not significant')} cgm.ols # print output # Plot correlograms (one- and two-tailed tests): par(mfrow=c(2,2)) # plots arrayed on screen in 2 rows, 2 columns symbol <- vector(mode="numeric", length=n.lags) # empty vector to dim(symbol) <- c(n.lags,1) # contain plotting symbols # Structure for one-tailed test correlogram. 'ylim' sets limits # for y-axis. plot(x=cgm.ols$dist, y=cgm.ols$Istd, type="n", xlab="lag distance", ylab="Istd", ylim=c(-.25,1)) title(main='OLS residuals, one-tailed test') for (i in 1:n.lags) {if (cgm.ols[i,11] == 1) {symbol[i] <- 15} # set plotting symbols: else {symbol[i] <- 0} # signif = closed square; not signif = open # Plot values of Istd. 'mkh' is height of symbol in inches; #'col' is symbol color points(x=cgm.ols$dist[i], y=cgm.ols$Istd[i], pch=symbol[i], mkh=.1, col=1) next } # Structure for two-tailed test correlogram. 'ylim' sets limits # for y-axis. plot(x=cgm.ols$dist, y=cgm.ols$Istd, type="n", xlab="lag distance", ylab="Istd", ylim=c(-.25,1)) title(main='OLS residuals, two-tailed test') for (i in 1:n.lags) {if (cgm.ols[i,13] == 1) {symbol[i] <- 15} # set plotting symbols: else {symbol[i] <- 0} # signif = closed square; not signif = open # Plot values of Istd. 'mkh' is height of symbol in inches; #'col' is symbol color points(x=cgm.ols$dist[i], y=cgm.ols$Istd[i], pch=symbol[i], mkh=.1, col=1) next } ################### end of program ################### III. Conditional spatial autoregressive (CAR) models (Haining 1990, Cressie 1993). Make sure the spatial module is turned on. # module(spatial) III. A. Create the spatial neighbor matrix ('W.CAR') containing the weights ('wij') that will determine the relative impact of location j on i in the CAR model. Sections 5.1.3-5.1.4 in Kaluzny et al. (1998) provide details for the codes in section III.A. Use the 'cbind' function to create a matrix named 'location' with two columns containing the x and y coordinates of the data points. # location <- cbind(mydata$x, mydata$y) Use the 'quad.tree' function to create a sorted matrix named 'quad' from the 'location' matrix. # quad <- quad.tree(location) Now use the function 'find.neighbor' to search for spatial neighbors in 'quad'. You must enter a value for 'max.dist', the distance beyond which locations are considered outside of each other's spatial neighborhood (zone of influence). Spatial weights (wij) will be zero for site-pairs that are not neighbors (i.e., points separated by a distance greater than 'max.dist'). An appropriate value for 'max.dist' is the distance beyond which residuals from an OLS model are not autocorrelated. This distance could be estimated from a semivariogram or correlogram of OLS residuals (see section II, above). The following code assumes a neighbor radius of 50 units (meters, kilometers, etc.). # nbr.CAR <- find.neighbor(x = location, quadtree = quad, max.dist = 50) Display the first 10 rows of 'nbr.CAR'. # nbr.CAR[1:10,] 'nbr.CAR' contains three data columns: indices for the two locations in a neighbor-pair and the distance separating them. By default, 'find.neighbor' includes locations as neighbors of themselves; remove the self-neighbors from 'nbr.CAR' by deleting columns where the inter-neighbor distance (column 3) is zero. # nbr.CAR <- nbr.CAR[nbr.CAR[,3] !=0,] Use the 'spatial.neighbor' function to create the matrix of spatial weights 'W.CAR' from the information in 'nbr.CAR'. The first line of code assigns the sample size to the variable n. The 'nregion' option in the 'spatial.neighbor' function ensures that W.CAR is an n by n matrix, even if some points have no neighbors. This will prevent error messages later on. # n <- nrow(mydata) W.CAR <- spatial.neighbor(row.id = nbr.CAR[,1], col.id = nbr.CAR[,2], nregion=n) Make sure the matrix 'W.CAR' is symmetric (the response 'T' should be returned). The first line of code enables the library of 'Matrix' functions. The second line of code tests for symmetry. # library(Matrix) is.Hermitian(spatial.weights(W.CAR), tol=0) If the value 'F' is returned, it may be due to rounding error. Try the following code. # is.Hermitian(spatial.weights(W.CAR), tol=0.000001) The weights matrix 'W.CAR' should be a symmetric n by n matrix with ones in the entries for neighbor pairs and zeros elsewhere. Because all the neighbor weights are equal to one, all neighbors exert the same influence on each other. You may want to re-define the weights so that closer neighbors will exert a stronger influence on each other than more distant neighbors. Lichstein et al. (0000) use 1/dij and 1/dij^2 as weights (where dij is the distance between locations i and j), and codes for these options are given below. Kaluzny et al. (1998, section 5.1.4) provide code for a more complicated weight function used by Cressie (1993). The following code defines the weights as 1/dij. # dij <- nbr.CAR[,3] W.CAR$weights <- 1/dij The following defines the weights as 1/dij^2. # dij <- nbr.CAR[,3] W.CAR$weights <- 1/dij^2 After modifying the weights matrix, you might want to check that it is still symmetric. It should be symmetric if you used one of the weight definitions given above (wij = 1/dij, or wij = 1/dij^2), or any other symmetric weight definition (i.e., wij = wji). # is.Hermitian(spatial.weights(W.CAR), tol=0) III. B. CAR trend/environment models. See Haining (1990), Cressie (1993), and Lichstein et al. (0000) for explanation of the CAR model. Kaluzny et al. (1998, section 5.3) give details for some of the codes. Use the 'slm' (spatial linear models) function to fit a CAR model containing all of the environmental variables and trend surface terms ('CAR trend/environment model' of Lichstein et al. 0000). Fine-scale autocorrelation in the response is modeled by the spatial neighborhood effect using the spatial weights matrix 'W.CAR' and the spatial parameter, rho, which 'slm' will estimate. # car.trend.env <- slm(response ~ env1 + env2a + env2b + x.std + y.std + x.sq, cov.family=CAR, data=mydata, spatial.arglist=list(neighbor=W.CAR)) View the output for the CAR model. The following two lines of code return partially redundant output. # car.trend.env summary(car.trend.env) Save the CAR residuals in 'mydata' in a column named 'car.trend.env.resid'. # mydata$car.trend.env.resid <- residuals(car.trend.env) Assign the log-likelihood of the CAR trend/environment model above to a variable named 'LL.car.trend.env'. 'slm' refers to the negative log-likelihood of the model as "objective", so we want the negative of this value. Then name the variable in the second line of code to see its value. This should be identical to the value of "Log-likelihood" in the model output above. We will need this value later to compute R-squareds for the CAR model. # LL.car.trend.env <- -(car.trend.env$objective) LL.car.trend.env Use the 'lrt.slm' (likelihood ratio test for spatial linear models) function to perform LR tests for the environmental and trend surface variables. Although significance tests are reported in the 'slm' output, there is no test for the overall significance of the categorical variable. We can easily obtain this test using 'lrt.slm'. Note that instead of using 'lrt.slm', we could fit reduced models and compute LR tests as we did above for the OLS models. But the 'lrt.slm' function computes the same tests with less code. 'lrt.slm' performs LR tests by setting the specified coefficient(s) to zero (or any other hypothesized value), and then comparing the log-likelihood of this restricted model to the log-likelihood of a model where the coefficient is free to vary. The following lines of code return LR tests for the categorical variable 'env2' and the trend surface term 'y.std'. # lrt.slm(car.trend.env, coefficients = c("env2a" = 0, "env2b" = 0)) lrt.slm(car.trend.env, coefficients = c("y.std" = 0)) Use the 'lrt.slm' function to test the significance of rho, the spatial parameter in the CAR model. If rho is significant, this means that the CAR model fits the data significantly better than the corresponding OLS model. In the syntax of 'lrt.slm', rho is a "parameter" (named 'param1' by default), whereas the explanatory variables (above) are "coefficients". # lrt.slm(car.trend.env, parameters = c("param1" = 0)) III. C. CAR intercept/rho models Fit a CAR model with only an intercept and rho. The response is modeled by fine-scale autocorrelation alone, without any environmental or broad-scale spatial variables (trend surface terms) in the model. (Note that we determined the neighbor radius for W.CAR based on the range of autocorrelation in the OLS trend/environment residuals. In an intercept-only model, the range of autocorrelation may be different, so W.CAR may not be the ideal weights matrix for a pure autocorrelation model.) View the output with the 'summary' function. Naming the object (last line of code below) will also return some output. # car.int.rho <- slm(response ~ 1, cov.family=CAR, data=mydata, spatial.arglist=list(neighbor=W.CAR)) summary(car.int.rho) car.int.rho Assign the log-likelihood of the CAR intercept/rho model above to a variable named 'LL.car.int.rho'. 'slm' refers to the negative log-likelihood of the model as "objective", so we want the negative of this value. Name the variable in the second line of code to see its value. This should be identical to the value of "Log-likelihood" in the model output above. We will need this value later to compute R-squareds for the CAR models. # LL.car.int.rho <- -(car.int.rho$objective) LL.car.int.rho Perform an LR test for rho from the CAR intercept/rho model above, and name the output object 'lrt.car.int.rho'. View the LR test statistic by naming the object in the second line of code. Then extract the LR test statistic (which 'lrt.slm' calls "chisquared") from the object 'lrt.car.int.rho' and assign the value to a variable named 'LR.car.int.rho'. We will need this value later to compute R-squareds for the CAR models. # lrt.car.int.rho <- lrt.slm(car.int.rho, parameters = c("param1" = 0)) lrt.car.int.rho LR.car.int.rho <- lrt.car.int.rho$chisquared III. D. Calculate R-squareds for the CAR models To compute model R-squareds using the formula of Nagelkerke (1991), we need the LL of the null model. The null model should contain only an intercept, with all other parameters (the environmental variables, the trend surface terms, and the spatial parameter rho) set to zero. We can calculate the CAR null model LL from the definition of the LR test statistic: LR = -2(LLreduced - LLfull). For example, consider the CAR model with only an intercept and rho ('car.int.rho' above) as the full model, and the CAR null model as the reduced model (intercept only). Since we know the LR statistic that compares these two models ('LR.car.int.rho') and we know LLfull ('LL.car.int.rho'), we can solve for LLreduced (i.e., the CAR null model LL) by arithmetic: LLreduced = -0.5*LR + LLfull. # LL.car.null.a <- -0.5*LR.car.int.rho + LL.car.int.rho LL.car.null.a Note that we could compute the identical value for the CAR null model LL by considering the CAR trend/environment model as the full model. In this case, the appropriate LR statistic would be for a simultaneous test of all of the parameters in the CAR trend/environment model. We assign this LR statistic to the variable 'LR.car.trend.env.all' in the last line of the following code. # lrt.car.trend.env.all <- lrt.slm(car.trend.env, parameters = c("param1" = 0), coefficients = c("env1" = 0, "env2a" = 0, "env2b" = 0, "x.std" = 0, "y.std" = 0, "x.sq" = 0)) lrt.car.trend.env.all LR.car.trend.env.all <- lrt.car.trend.env.all$chisquared Now, calculate the CAR null model LL by arithmetic just as we did above using the log-likelihoods from the CAR intercept/rho model. Recall that the log-likelihood of the CAR trend/environment model is 'LL.car.trend.env'. The returned value of 'LL.car.null.b' should be identical to that of 'LL.car.null.a' above. # LL.car.null.b <- -0.5*LR.car.trend.env.all + LL.car.trend.env LL.car.null.b We are now ready to calculate the R-squareds using the formula of Nagelkerke (1991). First, compute the R-squared of the CAR trend/environment model. This is the proportion of variation in the response that can be explained by the environmental variables, the trend surface terms (broad-scale spatial trend), and the spatial parameter rho (fine-scale autocorrelation) combined. The variable n must previously have been defined to be the correct sample size. # rsq.car.trend.env <- 1 - exp((-2/n)*(LL.car.trend.env - LL.car.null.a)) rsq.car.trend.env Next, compute the R-squared of the CAR intercept/rho model. This is the proportion of variation in the response that can be explained by fine-scale autocorrelation alone, without any environmental variables or trend surface terms in the model. # rsq.car.int.rho <- 1 - exp((-2/n)*(LL.car.int.rho - LL.car.null.a)) rsq.car.int.rho Finally, you may want to compute the partial R-squared for rho, i.e., the proportion of variation in the response that rho explains after controlling for the variation explained by the environmental variables and the trend surface terms. The easiest way to calculate the partial R-squared for rho is to subtract the OLS trend/environment R-squared from the CAR trend/environment R-squared. # rsq.rho.partial <- rsq.car.trend.env - rsq.ols.trend.env rsq.rho.partial If desired, you could also calculate partial R-squareds for any of the explanatory variables in the CAR or OLS models by subtracting the R-squared of a reduced model (lacking the variable of interest) from the full model containing that variable. In general, the full and reduced models for these calculations would either both be OLS models or both be CAR models. ########################################################## Literature cited: Borcard, D., P. Legendre, and P. Drapeau. 1992. Partialling out the spatial component of ecological variation. Ecology 73:1045-1055. Cressie, N. A. 1993. Statistics for spatial data, revised edition. John Wiley and Sons, New York, New York, USA. Haining, R. 1990. Spatial data analysis in the social and environmental sciences. Cambridge University Press, Cambridge, UK. Kaluzny, S. P., S. C. Vega, T. P. Cardoso, and A. A. Shelly. 1998. S+Spatial Stats user's manual for Windows and Unix. Springer-Verlag, New York, New York, USA. Legendre, P., and L. Legendre. 1998. Numerical ecology, second English edition. Elsevier Science, Amsterdam, The Netherlands. Lichstein, J. W., T. R. Simons, S. A. Shriner, and K. E. Franzreb. 2002. Spatial autocorrelation and autoregressive models in ecology. Ecological Monographs 72:445-463. MathSoft. 1999. S-Plus 2000 guide to statistics, vol. 1. Data Analysis Products Division, MathSoft, Seattle, Washington, USA. Nagelkerke, N. J. D. 1991. A note on a general definition of the coefficient of determination. Biometrika 78:691-692. Rossi, R. E., D. J. Mulla, A. G. Journel, and E. H. Franz. 1992. Geostatistical tools for modeling and interpreting ecological spatial dependence. Ecological Monographs 62:277-314.