Spatial Autocorrelation and Pseudoreplication in Fire Ecology

Fire ecologists face many challenges regarding the statistical analyses of their studies. Hurlbert (1984) brought the problem of pseudoreplication to the scientific community’s attention in the mid 1980’s. Now, there is a new issue in the form of spatial autocorrelation. Spatial autocorrelation, if present, violates the traditional statistical assumption of observational independence. What, if anything, can the fire ecology community do about this new problem? An understanding of spatial autocorrelation, and knowledge of available methods used to reduce the effect of spatial autocorrelation and pseudoreplication will greatly assist fire ecology researchers.


INTRODUCTION
Experimental designs and analyses in fire ecology are based on statistical assumptions that are often violated in experimental ecology.
Violation of statistical assumptions may result in rejecting or failing to reject null hypotheses at criterion levels greater or smaller than those intended for the analysis, which may lead to conclusions that are not consistent with the natural phenomena under study (Cliff and Ord 1975, Hurlbert 1984, Day and Quinn 1989. In 1984, Hurlbert attracted the scientific community's attention with his paper in which he defined Bataineh et al. Fire Ecology Vol.2 No.2 pseudoreplication as "the use of inferential statistics to test for treatment effects with data from experiments where either treatments are not replicated (though samples may be) or replicates are not statistically independent." Hurlbert pointed out that the problem with a replication-less study is that it will lack the estimate of error needed to judge the significance of a comparison. More recently, criticisms involve the underlying statistical assumption of independence (Robertson 1987, Legendre 1993, Lennon 2000, Legendre et al. 2002. The assumption of independence states that an observation of one sample is not influenced by the observation of another sample (Helberg 1996). Hurlbert (1984) stated that a lack of independence causes the alpha level to be unknown which in turn causes the interpretation of statistical analyses to be subjective. According to many authors, including but not limited to Cliff and Ord (1973), Sokal and Oden (1978b), and Legendre (1993), the assumption of independence is often violated because of spatial autocorrelation.
Techniques to determine if spatial autocorrelation exists have been used to determine the effect of these gradients on measurable parameters in the field, but also may reduce the number of experimental units (N's) that are useful for statistical analysis.
Spatial autocorrelation is the similarity between two observations of a measured variable based upon their spatial location (Griffith 1992, Legendre 1993, Lennon 2000. Positive spatial autocorrelation occurs when the similarity is greatest for close objects and least for objects spaced farther apart (Sokal and Oden 1978a, Robertson 1987, Diniz-Filho et al. 2003. Positive spatial autocorrelation is important for two reasons (Cliff and Ord 1973). The first reason concerns surface interpolation, which uses the values of variables at known locations to estimate the value of a variable nearby based on the assumption that objects closer together are more similar than objects farther away ( O'Sullivan and Unwin 2003). The other reason is a concern to researchers because the assumption of independence of observations for traditional statistical tests does not hold true (Cliff and Ord 1973, Legendre et al. 2002. Non-independence of observations negatively affects statistical tests by underestimating standard errors and inflating Type I errors (incorrectly rejecting a true H 0 ) (Cliff and Ord 1975, Diniz-Filho et. al. 2003.
The purpose of this paper is to explore the concept of spatial autocorrelation and how it may affect the statistical analyses, due to the assumption of independence, for fire ecology studies.

Spatial Autocorrelation: A History and How It Applies to Ecological Studies
Spatial autocorrelation was first recognized as a problem in 1914 by Student (Cliff andOrd 1975, Griffith 1992), who acknowledged that observed correlation for geo-referenced data series is only attributable to the geographic location of the data. Student suggested 'trend surface analysis' (the removal of trends by regression methods) as a technique to account for spatial autocorrelation. Stephan (1934) warned social researchers gathering census tract data that data of geographic units are not independent "like balls in an urn", but dependent "like bunches of grapes".
Stephan also stated that contiguity in time and/or space does not itself indicate non-independence, but characteristics of social data are by virtue interrelated.
Stephan's cluster of grapes conception led to sampling designs that neutralized spatial autocorrelation (Griffith 1992). Moran (1948) and Geary (1954) both developed indices (Moran's I and Geary's G, respectively) to measure spatial autocorrelation. Matheron (1963) discovered that the value for a variable at an unsampled location could be estimated using the observed values of the neighboring variables based on their spatial dependence structure. Tobler (1970) created the first law of geography, which states "everything is related to everything else, but near things are more related than distant things." According to Griffith (1992), in 1970, Gould, a geographer, stated that spatial data series fail to meet the assumption of independent observations of traditional statistical tests. Legendre (1993) stated that ecological data (such as obtained in fire studies) are inherently spatially autocorrelated; for example, the species composition at one location is influenced by the species assemblage of the surrounding locations due to contagious biotic processes. Fortin and Jacquez (2000) attribute spatial autocorrelation in ecological studies to ecological processes that have a geographic element; for example, dispersal, allelopathy, and spatial competition for resources. According to Ver Hoef and Cressie (2001), all field experiments have a spatial component, in which the experimental units are positioned in one, two, or three-dimensional space. This spatial environment is generally heterogeneous (i.e. a mosaic of patches that exhibit varying degrees of spatial autocorrelation, both within and among the patches), which impedes the researchers' ability to find homogeneous areas to serve as experimental units. In addition, the problem of spatial heterogeneity can cause unequal plant responses to the experimental treatments (Fortin and Gurevitch 2001). However, these effects can be minimized through the use of proper experimental designs or improved statistical analyses, such as blocking, nearest neighbor analysis, or trend surface analysis (van Es and van Es 1993). In fact, a common way to try to account for the effects of spatial heterogeneity is to use a randomized block design. The random assignment of experimental units to treatments alone assures that the observations are independent, but does not ensure that neighboring units are spatially independent; thus the need for blocking (Fortin and Gurevitch 2001). If pre-fire measurements are possible for a fire study, then blocking using spatial autocorrelation technigues may aid in plot placement. Spatial autocorrelation techniques can be used to block, without identifying the specific cause of the autocorrelation. This use of spatial analysis may save time and money in determining plot placement for a field study. However, if the size of the block does not match the size of the spatial pattern of the plants or the plants spatial responses to the treatments, then the effects of spatial heterogeneity may still be a problem (Fortin and Gurevitch 2001).
In addition, it is suggested that researchers use at least 30 localities (plots) to measure for the presence of autocorrelation in their data (Cliff and Ord 1975;Fortin et al. 1989); however, due to the varied nature of fire on the 110 Bataineh et al.
Fire Ecology Vol.2 No.2 landscape, it is often difficult to position a minimum of 30 plots within an area that is perceived to be uniform in fire severity. The spacing of plots is often determined based on the size of the vegetation being evaluated (e.g. large plots for trees, smaller plots for herbaceous vegetation) and small plots are often nested within larger plots in order to evaluate relationships between overstory parameters and understory parameters as affected by a single fire event. As a result, plots may exhibit spatial autocorrelation for precisely the reason they were laid out; to capture fire effects within a relatively small homogeneous area of the landscape. So, even though it may be possible to observe, and therefore quantify, the spatial limits on the landscape for some preconceived condition (or as Mueller-Dombois and Ellenberg (1974) stated, "subjectively but without preconceived bias") for which we wish to block for, such as slope, aspect, soil type or vegetative community, we may end up incorporating the spatial autocorrelation condition that is suggested we avoid. The r apid growth of GIS-based software and applications has brought the disciplines of fire ecology, geography, and spatial sciences together in an effort to explain landscape-level heterogeneity associated with wildland fire. Work by a variety of authors, including but not limited to Chou, Getis, and Anselin (Anselin, Chou 1992, Chou et al. 1993a., Chou et al. 1993b., Getis and Franklin 1987, has shown the value of this collaborative effort. If one was to compare the presentations made at the three International Fire Ecology and Management Congresses (2000Congresses ( , 2003Congresses ( , 2006, the growth of this type of analysis is readily apparent. Because wildland fires are not replicated, it is important that the number of N's are maximized and blocking is applied within a designated treatment level (i.e. burn severity) so that statistical analysis can provide valuable data without the influence of pseudoreplication or of spatial autocorrelation.
The challenge is compounded by heterogeneity on the landscape. As Getis and Franklin (1987) point out, this heterogeneity is a function of the scale of analysis. Therefore, it is important that the fire researcher utilizes the largest number of N's possible to account for the variability recorded in the smallest plots (to measure smaller vegetation) within a treatment type to compensate for the lack of real replication (i.e. avoid pseudoreplication effects) without increasing the spatial autocorrelation potential at the larger plots.

Detecting Spatial Autocorrelation
The first step to detect autocorrelation of a dataset is to plot a variogram cloud and a variogram. When each observation has both spatial (e.g. coordinates) and attribute (e.g. biomass) components measured, a variogram cloud can be built by plotting positional distance ( x-coordinate) against absolute difference of attribute (y-coordinate) between a pair of observations.
To summarize the relationship, pairs of observation within a distance range are binned to calculate semivariance, and the variogram plotted. Equation 1 shows the calculation of semivariance, where Z(x) is the attribute value at x location and h is the lag distance, while N is the total number of pairs within the distance range.  Figure 1 shows an example of variogram constructed from a loblolly pine plantation dataset without considering directional effect.
The semivariance is in the unit of centimeter for dbh measurement, whereas the lag distance (distance between individual trees) is measured in meters.
A spherical model was fit to depict the relationship with 2.3 m as nugget (the variance at zero distance), 30.4 m as range (beyond which the semivariance is constant) and 4.6 meter as sill (the constant semivariance value beyond the range). It implies that trees within 30.4 m apart inherit autocorrelation, where difference in dbh increases as distance increases. If sample trees were selected within this range, the resulting statistics might be biased due to the violation of random sampling. On the other hand, if spatial interpolation is to apply based on autocorrelation being in existence, observed points surrounding an estimated location should not go beyond the distance of range.
Plotting a variogram is computationally intensive. VARIOWIN (Pannatier 1996)   quantitative approach to measure autocorrelation is Moran's I, which is applied where numeric data are available. Its' calculation is a translation of a nonspatial correlation measure to a spatial context (Equation 2). The w ij term is the weight between two observations ( x i and x j ) and can be calculated based on the distance (inverse distance) or a fixed bandwidth.
The G i does not include the target feature itself in calculation, whereas the G i * does. In GIS packages such as ArcGIS, G i * is available as a spatial statistics tool for mapping clusters. Including the target feature enhances in finding hot spots, since the observed value itself contributes to the occurrence of the cluster. A group of observations with high G i * values indicate a cluster of features with high attribute values, and vice versa. Both versions of G-statistics require an arbitrarily defined neighborhood. Anselin (1995) (Beaulieu 1975, Oswald, 1981, Oswald and Covington 1983, Oswald and Covington 1984, Rountree 2004, Bataineh et al. 2006). The fire itself and the initial publications occurred prior to Hurlbert's 1984 publication, and before spatial autocorrelation techniques were considered in field studies. Only the Rountree (2004) and the Bataineh et al. (2006) publications dealt with both issues.
The original study was established in 1972, post-fire on 3 sites (high severity, low severity, unburned): a second unburned site established in 1974, and the initial unburned site was prescribed burned in 1977. Beaulieu (1977) used a T-test to compare the two unburned sites and found no significant differences. As a result, the pre-2004 publications used the unburned sites as comparisons to the burned sites.
Within each site, 30 center points were located along transects running perpendicular to the long axis of each site. Four circular sampling plots (0.89 m 2 each) were established at 90 o angles 7.1 meters around each center point, two along each transect and the other two perpendicular to the transects. For each sampling year (1972,1974,1980,2003,2004), two sampling plots were randomly selected for sampling. Midsummer of each sampling year, the herbaceous stems within the selected sampling plots were identified to species, counted, clipped and dried to determine biomass production.
The mean of the two sampling plots was used in the analyses. In the earlier published manuscripts (Oswald andCovington 1983, Oswald andCovington 1984), Analysis of Variance (p= 0.05) was used to test the effect of passage of time (1972 to 1980) and severity of burn on biomass production.
The Student-Newman-Kuels Multiple Range test was used to test differences. The relatively large number of N's within each site allowed for statistical analysis even though treatments (burn severities) were not replicated.
In Bataineh et al. (2006), PROC Mixed was used for repeated measures, with Akaike's Information and Schwarz' Bayesian as the model fit criteria. Oneway ANOVA's was used when a significant treatment by year was found and Tukey's multiple comparison procedure used to separate significant treatment means. To incorporate species composition, multi-response permutation procedures (MRPP) was utilized after a Bonferrroni adjusted was made. Indicator Species Analysis was conducted to determine which species 114 Bataineh et al. Fire Ecology Vol.2 No.2 were driving the differences between sites. If significant differences among sites were found in 1972 or 2003 using MRPP, the Mantel test was used to determine if there was a relationship between overstory attributes and understory attributes. Detrended Correspondence Analysis (DCA) was used to summarized understory species composition and production data, with non-metric multi-dimensional scaling (NMDS) chosen as a complimentary technique.
To explore the possibility of whether any of the plots within a site for a given year studied may have been spatially autocorrelated, semivariograms were produced using PROC VARIOGRAM. The lag distance was set at 0.0002 with a maximum lag of 20 (SAS Institute Inc. 1999). Lags are the subdivisions of the distance axis into intervals.
Lag distances are plotted against the semivariances to produce the semivariograms. Semivariance is half the variance of the differences between all possible points at a constant distance apart (O'Sullivan and Unwin 2003). The semivariogram graphs were visually compared to spatial covariance models to determine if the response variable, production, was spatially correlated among plots within each site for a given year studied. None of the exploratory semivariogram graphs were similar to any of the covariance models. Therefore, it was assumed that the inference of the study's results would not be compromised by spatial autocorrelation.
This assumption was verified through the use of repeated measures procedure in PROC MIXED. This procedure required selection of a covariance structure.
Based on the Akaike's and Bayesian's fit criteria of the study's data, the compound symmetric covariance structure, which assumes homogeneous variances, was selected as the most suitable. If spatial autocorrelation existed in the data, then a different covariance structure would have been found most suitable to use in the analysis (Little et al. 1996).
In reference to the title of Legendre's paper about spatial autocorrelation (Legendre 1993), Fortin and Dale (2005) sustain that spatial autocorrelation represents both trouble and a new paradigm in ecology. Our current knowledge of natural systems indicates that spatial autocorrelation is a common issue in ecological studies and something that researchers will have to start addressing.
What can be done if spatial autocorrelation was found? One approach to the problems caused by spatial autocorrelation during statistical testing could be to adjust the Type I error to a more conservative value (Dale and Zbigniewicz 1997), for instance a=0.01 instead of a= 0.05. However, without the appropriate knowledge of the true autocorrelation structure, there is a risk for being too conservative (Fortin and Dale 2005).
Considering that spatial autocorrelation modifies the effective sample size n' (n'<n), the relation n'= n [(1-?)/(1+?)] could be used to estimate the effective sample size if the matrix of covariance among locations can be described by a first-order autoregressive correlation structure (Cressie 1991, Fortin andDale 2005). Fortin and Dale (2005) indicate that this approach can be used for one-and two-sample t-tests and for ANOVA comparisons among means. The same correction could be applied to paired sample t-tests .

December 2006
Spatial Autocorrelation and Pseudoreplication 115 Fortin and Dale (2005) advocate the use of a simple model of the spatial autocorrelation structure and Monte Carlo simulations. In this approach, a parametric model for the spatial autocorrelation of the form: where h is the lag, and i ε is N(o,s 2 ), is used to generate artificial data sets in a Monte Carlo simulation and to estimate confidence intervals for the test statistic (Manly 1997) (see Fortin and Dale 2005 for an example).

Based on
Hurlbert's pseudoreplication paper, t o use traditional statistical tests to analyze data from any study, there must be replications of the treatments.
The problem for fire ecologists is that this is a proven difficulty in wildfire studies. The few ways around this dilemma are tricky. One way is to employ a repeated measures design for the study in which time serves as the replication. Another way is to use non-traditional statistics such as ordination, cluster analysis, MANOVAs, Mantel Tests, etc.
Based on the new appreciation of spatial a utocorrelation, spatial autocorrelation should be tested for, or accounted for, either before the study is implemented or before the analysis of any study with a spatial component. The problem for fire ecologists is that the limited statistical analyses they have to choose from due to pseudoreplication have been further narrowed down to a few statistical analyses that deal with spatial autocorrelation as well.
The issues of pseudoreplication and spatial autocorrelation are real and valid concerns. They deal with statistical assumptions that if violated make inferences untrustworthy. Therefore, we have reached a crossroads. Either, there needs to be greater research invested into creating statistical analyses that are more valid for wildfire studies or there needs to be greater research invested into figuring out how to design and implement wildfire studies that will meet the assumptions of the pre-existing statistical analyses.
In summary, conducting a pilot study is advised to find out characteristics of the spatial autocorrelation such as behavior according to distance and whether it is anisotropic. Furthermore, based on the findings from two recent studies on the effects of spatial structure on the design and analysis of field experiments (Legendre et al. 2002(Legendre et al. , 2004, Fortin and Dale (2005) suggest the following lessons to be learned: a. if spatial autocorrelation is present, the use of blocks is recommended; b. in the presence of spatial autocorrelation, and given a certain number of experimental units, smaller blocks spread across the study area provides greater statistical power; c. short-range spatial autocorrelation, when compared to the size of experimental units and blocks, has a stronger effect on ANOVA tests than long-range spatial autocorrelation