Twenty years and counting with SADIE : Spatial Analysis by Distance Indices software and review of its adoption and use

SADIE (Spatial Analysis by Distance Indices) is designed specifically to quantify patterns in spatiallyreferenced count-based data. It was developed for dealing with data that can be considered ‘patchy’. Such distributions are commonly found, for example, in insect populations where discrete patches of individuals are often evident. The distributions of such populations have ‘hard edges’, with patches and gaps occurring spatially. In these cases variance of abundance does not vary smoothly, but discontinuously. In this paper we outline the use of SADIE and provide free access to the SADIE software suite, establishing Rethinking Ecology as its permanent home. Finally, we review the use of SADIE and demonstrate its use in a wide variety of sub-disciplines within the general field of ecology.


Introduction
It is over twenty years since Joe Perry introduced and developed the SADIE (Spatial Analysis by Distance Indices) methodology to study spatial patterns in count-based data where locations are specified (Perry and Hewitt 1991;Perry 1995;Perry 1998). SADIE is designed specifically for analysing counts of individuals at known locations, and was developed for dealing with ecological data that can be considered 'patchy'. Such distributions are commonly found, for example, in insect populations where discrete patches of individuals are often evident. SADIE reflects this patchiness, recognised by ecologists, of populations in heterogeneous environments. Such populations have 'hard edges', with patches and gaps where variance of abundance does not vary smoothly, but discontinuously. Other approaches available to ecologists that allow the analysis of spatially-referenced data either assume a smooth surface for abundance with gradual change, so that populations may be viewed as contour maps of abundance (methods such as geostatistics, variograms, and Ripley's K), or allow the description of individuals (not counts) at specific locations through, for example, nearest-neighbour techniques or within area-based entities (such as maps of counties). We recommend that those reading this paper refer to the papers by Dungan et al. (2002) and  that provide both a useful general introduction into techniques that may be used to explore spatial pattern in ecological datasets, and describe the use of SADIE within this context. The main message of  is that one needs to tailor the choice of method to the available data type. The importance of tools to quantify and explore spatial pattern in ecological systems is recognised strongly in the wider literature (e.g. Campos-Herrera and Lacey 2018; Dungan et al. 2002;Fortin and Dale 2005;Madden et al. 2007;Samways and McGeogh 2010).
Traditionally, count-based spatial pattern has been explored using mean-variance relationships (e.g. Lloyd 1967;Taylor et al. 1978;Pacala and Hassell 1991). However, such approaches focus only on the frequency distribution of counts and not their location. Since they fail to retain important information with regards to the relative position in two dimensional space of each count, crucial information is lost (Figure 1a-c). This loss is important as it is known that the location of individuals in relation to one another mediates many ecological interactions (e.g. Hassell et al. 1991;Harwood et al. 2001).
Geographical Information Systems (GIS) approaches offer, in principle, tools to explore spatial solutions to this problem (Wegmann et al. 2016). However, GIS solutions are not well suited to ecological count-based data that often exhibit variances much greater than the mean (primarily due to the frequency of zero counts). GIS interpolation techniques, designed for smoothly changing gradients, do not deal well with discontinuities caused by aggregation of biological organisms into distinct populations with well-defined local boundaries, such as pollinators on flowerheads. Datasets of this type are common in ecology and particularly so in entomology where insect counts are often patchy. SADIE was developed in the entomology department of Rothamsted Research as a direct response to this problem. The method provides indices to detect and measure clustering into 'patches' and 'gaps' for count-based data that is spatially referenced in two dimensions. The term 'cluster' is an important one in SADIE methodology and refers to a neighbourhood of either relatively large counts (a patch) or small counts (a gap). In addition, graphical displays termed 'red-blue' plots may be used to visualise the extent of clustering, and the location of patches and gaps. SADIE has been extended to provide a means to formally test the degree of association between two sets of spatially-referenced counts (Perry and Dixon 2002). This is particularly useful when, for example, it is desired to compare the spatial pattern of two species, of two dates or with abiotic data such as soil moisture (Holland et al. 2007).

A brief overview of SADIE methodology
In this section we provide an overview of the key elements of the SADIE system. Full details of methods and key papers that explore approaches to analysing spatially-referenced data are summarised in Table 1. In particular, the papers by Dungan et al. (2002) and  provide a useful general introduction into analytical techniques that may be used to explore spatial pattern in ecological datasets. The SADIE approach has four key components: (i) an index of aggregation (I a ), quantifying the presence and degree of clustering; (ii) indices that quantify the presence of neighbourhoods of relatively high counts (V i ) or low counts (V j ), termed patches and gaps respectively; (iii) red-blue plots that provide a visual representation of the degree of clustering; and (iv) an association index (X), analogous to a correlation coefficient that represents the association or dissociation between two datasets which may also be mapped to create plum-green plots.
(i) I a , index of aggregation. The method equates the degree of spatial pattern in a set of counts (each with an x,y coordinate) to the minimum effort that would be required to move to a completely regular arrangement with counts equal at each location. This is equated using the minimum distance, D, provided by a transportation algorithm from the linear programming literature (Kennington and Helgason 1980). A unique solution is provided based on notional flows of individuals from 'donor' sample units (with greater than average abundance) to 'receiver' units (with less than average abundance). From this solution, an aggregation index (I a ) is calculated that quantifies the degree of clustering independently of the numeric values of the counts (Figure 1a-c). This is generated by division of the observed value of D by the mean value generated from simulated randomisations of the counts among the locations. Values of I a = 1 indicate randomly arranged counts, whilst I a > 1 indicates aggregation of counts into clusters and I a < 1 indicates regularity. Comparison of the observed value of D with the tails of the distribution of corresponding values from these simulations provides a randomisation test (Besag and Diggle 1977) of the null hypothesis that the observed counts are arranged randomly, with a probability level, P a . Table 1. Key papers that introduce SADIE analytical tools or provide an overview of general methods. A citation count using SCOPUS is also provided.
Paper Scope Citations Perry 1995 The pattern of spatial distribution of Dalbergia cearensis saplings and adults were investigated with spatial analysis by distance indices, using the software Sadie Shell, version 8.0. The aggregation index (I a ) was used to explore spatial pattern.

205
Perry 1998 SADIE methodology is discussed in this paper and reviews the method's advantages over traditional approaches that measure only statistical variance heterogeneity. Two indices and associated tests are considered, one based on the total distance of the sample from a completely regular arrangement, the other from a completely crowded arrangement. A diagnostic plot is presented to aid interpretation. Methods are presented to estimate both 'typical' cluster size and inter-cluster distance.
225 Perry et al. 1999 This paper introduces a new index and four new graphical displays, termed 'red-blue' plots. The index may be used to detect clusters in the form of patches, comprising several nearby large counts, and in the form of gaps, comprising several nearby small counts. The methods facilitate a comprehensive definition of the size and dimension of a cluster.
266 Dungan et al. 2002 Case studies are used to explore the application of spatial methods. The influence of observational scale and the importance of carefully constructing both sampling design and analysis are reviewed. A set of considerations for sampling design to allow useful tests for specific scales of a phenomenon under study are provided.

432
Perry and Dixon 2002 A method to assess spatial association between two sets of count data is described. This uses a measure of local association for counts, based on comparison of SADIE clustering indexes of the two sets at each sample unit. The mean of the measure is represented by the simple correlation coefficient between the clustering indices of the two sets. Spatial association may be mapped for count data, and clusters of units with positive association or negative dissociation may be identified.
175  This paper provides guidance to ecologists with limited experience in spatial analysis to help in their choice of techniques. A case-study approach is used to compare analytical approaches. Guidance is provided through a taxonomy of data types, a discussion of the effects of sampling, and consideration of transformations that may be used to convert data. Key spatial analysis techniques developed in plant ecology, animal ecology, landscape ecology, geo-statistics and applied statistics are briefly reviewed. Users are encouraged to initially use simple visualisation techniques, followed by methods appropriate for the data type.

299
(ii) Patch and gap cluster indices (V i and V j ). These indices are part of the 'redblue' analysis tool and provide a means to measure the presence of spatial pattern by identifying neighbourhoods of consistently high counts (patches) or low counts (gaps) respectively. In the raw data, each location has its designated x,y coordinate and a corresponding count (c). SADIE assigns each location an index of clustering using the mean count m; either a positive ν i index for patch units with c i >m, or a negative ν j index for gap units with c j <m. These cluster indices are then used to calculate overall cluster indices V i and V j for patches and gaps, respectively, with associated significance values. The indices indicate whether the dataset is characterised by the presence of measurable patches, gaps, or both ( Figure 1d). (iii) Red-blue plots. Red-blue plots provide a visual representation of the degree of clustering (Figure 1d). Following their calculation of the values of ν i and ν j , cluster In maps a-c, the same 25 counts, with mean, m = 9.08 and variance, s 2 = 75.9, are arranged in a 5×5 grid. The variance greatly exceeds the mean, so these counts come from a heterogeneous distribution more highly-skewed than the Poisson distribution. However, the frequency distribution of the counts (0 7 , 3 3 , 5 3 , 6, 8, 9, 20 9 ) discards any information regarding the spatial arrangement of the counts. The arrangements clearly differ. In a the counts were deliberately arranged so that the larger counts are relatively far away from other large counts (and the smallest counts are far from other small counts), producing a distribution of the counts that is more regular than random. In b the counts were distributed completely randomly amongst the 25 cells of the grid. In c the larger counts are placed close to one another (as are the smaller counts), to yield a distribution that is highly clustered and spatially aggregated. The SADIE technique exposes these differences in spatial distribution. In a the SADIE index of aggregation I a was less than unity (I a = 0.80) indicating regularity, with corresponding probability level P = 0.96 indicating a significantly regular pattern. In b I a is close to unity (I a = 0.95, P = 0.57) indicating a randomly allocated pattern. In c I a greatly exceeds unity (I a = 1.88, P < 0.0002) indicating a significantly clustered pattern of patches and gaps. In d the value of the local index of clustering is shown for each of the 25 grid units of map c after SADIE analysis. Positive values, shown in red, indicate potential patches and negative values, shown in blue, indicate potential gaps; the larger the value, the greater is the evidence for clustering locally. Significant clustering into a patch in the lower left of the grid, and a gap on its opposite edge is shown where the units lie within the red (patch) or blue (gap) areas. In this example of a red-blue plot both patch (red, V i =1.9, P<0.001) and gap (blue, V j =-1.7, P<0.001) clusters are evident ).
indices may be mapped, interpolated and contoured. Clusters are defined as areas enclosed by contour levels of +1.5 or -1.5; these indicate clustering (ν i > 1.5 or ν j < -1.5), half as great again as expected by chance. Whilst arbitrary, the average magnitude of the indices within contours bounded by these contour levels have been found to approximate well to 95 th centiles (centiles divide the ordered set of values into 100 parts) of their respective randomisation distributions. Once identified, clusters may be measured for size, shape, location and proximity to other clusters. Red-blue plots may be easily generated using software such as SURFER (Golden Software, current version 15). (iv) Association index X. Two populations may be spatially positively associated, negatively dissociated, or not be associated at all. The SADIE suite of tools includes a measure of local spatial association, represented by the local index χ k . This index is based on the similarity between the clustering indices of the two populations, measured locally at each location ( Figure 2). Positive values of χ k arise from coincidences of patches or of gaps in both populations; negative values from opposite cluster types. An overall index of spatial association (Χ) is calculated as the mean of local values, and is in fact equivalent to the simple correlation coefficient between the pairs of cluster indices. The significance of Χ may be tested by randomisations, after allowance for small-scale spatial autocorrelation from either population (Dutilleul 1993). It should be noted that measurable local association does not necessarily imply that larger scale spatial pattern is evident (a temporally stable but spatially random distribution would exhibit spatial association between consecutive sample dates). Plum-green plots provide an analogue of red-blue plots to visually assess areas of high association and/or dissociation.
Additionally, Xu and Madden (2005) proposed that a new index could be deployed following the SADIE approach. Li et al. (2012) introduced such an index that adjusts for the absolute location or the magnitude of the counts, and defines a new local clustering index.

Software availability
SADIE is available to users as a series of Windows-based programs that were developed by Kelvin Conrad and originally made available on his 'SADIE Re-heated website'. These programs have Help Files that provide both background on methods and instructions for use. Count-based data files (x, y, count) are required in simple text format. SADIE allows pairwise comparisons of distributions when association is being investigated. Other variables (e.g. continuous environmental variables) can also be incorporated into analyses, contingent on these data being spatially referenced. Continuous variables need to be integerized prior to analysis. In addition, Li et al. (2012) provide an R routine which calculates both the original SADIE indices and their alternative clustering index (https://www.rdocumentation.org/packages/epiphy/versions/0.3.4/ topics/sadie) and the R package Epiphy (https://cran.r-project.org/web/packages/epiphy/vignettes/epiphy.html) includes SADIE analytical methods. The original SADIE program suite available for download with this paper includes: Figure 2. Spatial association and dissociation is illustrated by map a where there are ten red and ten green individuals, and map b where the ten green individuals are in exactly the same positions, and there are ten blue individuals. In fact, when viewed as single populations, the blues have a similar spatial pattern to the reds, their positions being merely a rotation and reflection of the reds. If counts were taken of the individuals in 25 cells using the dashed lines, then all three populations, red, green and blue, would each have identical frequency distributions: one count of 4, one of 3, one of 2, one of 1, and 21 zeroes. Furthermore, for both maps, only in the central cell do any of the non-zero counts coincide. For both the red-green counts and the blue-green counts, there are sixteen (0,0) pairs, one (2,0), one (0,2), one (3,0), one (0,3), one (4,0), one (0,4) and one (1,1) pair. The non-spatial correlation coefficient between the counts of the reds and greens, and between the blues and greens, is the relatively small value of -0.115, indicating no difference between the maps and neither strong similarity or dissimilarity. However, in a the ten red and the ten green individuals occur in similar areas and appear positively spatially associated and, by contrast, in b right-hand map, the ten green individuals occupy different areas and appear negatively spatially dissociated from the ten blue individuals. SADIE analysis gives the following values of I a (with corresponding probability levels, P a ) for the green individuals: 1.45 (0.0079); for the reds: 1.18 (0.13); for the blues: 1.03 (0.36). Using the correlation between the clustering indices of each individual population, the SADIE index of association X (with corresponding probability level P) was, for the red and green populations in map (a): 0.578 (0.01); for the blue and green populations in map (b): -0.117 (0.66). The SADIE association index is capable of distinguishing the strong positive association in map (a) from the (albeit somewhat weaker) negative dissociation in map (b). In a similar fashion to red-blue plots, which give a visual representation of the degree and location of the clustering indices arranged into patches and gaps, SADIE association analysis allows the construction of plots that show areas of strong association and dissociation. For example the plot in map (c) is constructed from the association analysis of the clustering indices from the blue and red populations of map (b); it shows areas of significant positive association in plum and of dissociation in green.
• SADIEShell Version 2.0. Calculates the aggregation index I a and red-blue indices V i and V j , with associated probabilities. Also generates the file "cluster.dat" which is required for association analysis; • N_AShell Version 1.0. Calculates the association index X with associated probability for a pair of datasets; • AssocBatchSetup01. Designed to perform all pairwise associations when there are more than two datasets; it operates on a list of SADIE spatial pattern results files. The resulting output comprises the association analysis results for the complete matrix (between any two sets) or a triangular matrix of comparisons, with each analysis held in a separate results folder. • AssocExtractor060 Version 0.6. This program extracts the section of the file containing the local association indices. This is the equivalent of the cluster.dat file from SadieShell. • ClusterShell Version 1.0. A program to measure characteristics of patch and gap clusters created by a SADIE red-blue analysis. Its purpose is to identify all patch and gap clusters by assigning each unit either to a cluster or to a non-clustered area.
The program also identifies units that are on the boundaries of a grid, defines cluster centroids and examines the minimum distances between the cluster centroids and between the edges of clusters. • Red-Blue Batch Runner Version 1.0. This program is specifically designed to perform a large set of spatial analyses. The resultant output is placed in the designated input folders and many analyses can be performed sequentially without additional user intervention. It has stringent requirements for data file names and folder storage. If only a few SADIE analyses are required, then SADIEShell is easier to use. • Results Collector Version 0.61. This program extracts the section of the file containing the summary statistics and confidence intervals.

Review of studies using SADIE
We surveyed papers that included the phrase 'spatial analysis by distance indices' in the title or abstract using the SCOPUS database of peer-reviewed literature (www.scopus. com, accessed July 2018). This allowed us to identify studies that applied SADIE directly for analytical purposes, and provided a snapshot of the range of disciplines using the technique. In total, 130 papers were evaluated (Figure 3; Table 2), all of which were in the general field of ecology and spanned a wide range of sub-disciplines. Most studies (44) were entomological and were primarily concerned with pests or interactions between pests and their natural enemies. The fields of mycology and plant ecology were also strongly represented. Mycological studies (18) were primarily related to agricultural production, whilst plant ecology studies (18) focused more widely on natural ecosystems. We also reviewed papers that were 'highly cited'; arbitrarily set at fifty SCOPUS citations excluding SADIE methodology papers (Table 3). The ten papers spanned entomology (7), plant ecology (2) and mycology (1), demonstrating that the approach has been used successfully in a wide range of ecological disciplines. Examples of studies representing these three subject areas are: • Maestre et al. (2003) investigated the role that soil surface properties play in semiarid ecosystems. The study used both SADIE and principal components analysis to quantify patterns of seedling survival, identifying clearly defined areas of both high and low survival. Seedling survival was coupled primarily with the variables bare soil cover, sand content, and soil compaction. The results help inform restoration practices in semiarid ecosystems; • Winder et al. (2001) studied the spatio-temporal dynamics of two aphid species (Metopolophium dirhodum and Sitobion avenae) and a generalist predator (Pterostichus melanarius). Using SADIE, ephemeral spatial pattern at the field scale was demonstrated. Additionally, a positive, lagged beetle response to this aphid pattern was observed whilst aphids displayed a negative, lagged response to beetle spatial pattern. A strong response by the beetle population to aphid patches was reported, and the spatially coupled dynamics were sufficient for the predator to have a negative effect on the intrinsic rate of increase of their prey;  • Turechek and Madden (1999) studied the spatial pattern of strawberry leaf blight in perennial production systems. The study adopted an intensive sampling approach that evaluated infection at the level of leaflet, leaf, plant and field. Overall, results showed that the incidence of leaf blight was characterised mainly by small, loosely aggregated clusters of diseased leaflets.

Conclusions
The four key SADIE methods papers (Perry 1995;Perry 1998;Perry et al. 1999;Perry and Dixon 2002) have been cited 205, 225, 226 and 175 times on the SCOPUS database (www.scopus.com, accessed September 2018) respectively. At least 130 studies have used the approach to analyse spatially-referenced data (Table 2). Clearly, the method has therefore been adopted as a useful technique to approach the quantification of pattern in spatially-referenced count-based data. SADIE has allowed ecologists to utilise these intuitive and easily comprehensible indices and graphical outputs (red-blue and plum-green plots) in order to compare spatial distributions at either an instant in time, or along a time series. The software provides comprehensive tools to conduct SADIE analyses, but, we also encourage others to fully implement the approach under the R platform. It is hoped that the establishment of a permanent home for the SADIE software suite through Rethinking Ecology will encourage its use and further development.

Author contribution
Designed and wrote the manuscript: LW, CA, GG, JH, CW, JP. Prepared the figures: JP, LW. Table 3. Highly cited papers from the SCOPUS database (arbitrarily set at more than 50) utilising SADIE as an analytical tool. Winder et al. 2001 Modelling the dynamic spatio-temporal response of predators to transient prey patches in the field Entomology 140 Maestre et al. 2003 Small-scale environmental heterogeneity and spatiotemporal dynamics of seedling establishment in a semiarid degraded ecosystem Plant Ecology 139 Thomas et al. 2001 Aggregation and temporal stability of carabid beetle distributions in field and hedgerow habitats Entomology 126 Thomas et al. 1998 Isolating the components of activity-density for the carabid beetle Pterostichus melanarius in farmland