The Find Outliers tool will determine if there are any statistically significant outliers in the spatial pattern of your data.
- Where do we find anomalous spending patterns in Los Angeles?
- Where are the sharpest boundaries between affluence and poverty in the study area?
- In your area, are there retail stores that are struggling with low sales despite being surrounded by high performing stores?
- Where are there unexpectedly high rates of diabetes across the study area?
- Are there counties in the United States with unusually low life expectancy compared to their neighboring counties?
The input features may be points or areas.
The Find outliers of parameter is used to evaluate the spatial arrangement of your features. If your features are areas, a field must be chosen. Outliers will be determined using the numbers in the chosen field. Point features can be analyzed using a field or the Point Counts option. If Point Counts is used, the tool will determine if the points themselves are unusually dispersed or clustered, rather than high and low field values.
If points are being analyzed with Point Counts, two additional options will be available. The Count points within parameter allows the points to be aggregated within a Fishnet Grid, Hexagon Grid, or an area layer from your Contents, such as counties or ZIP Codes. The Define where points are possible parameter is used to create an area or multiple areas of interest. The three options for this parameter are None, meaning all points are used, an area defined by an area layer from your Contents, and areas created using the Draw tool.
Your data can be normalized using the Divide by parameter. The Esri Population data uses GeoEnrichment and requires the use of credits. Another option is to normalize using a field from the input layer. Some possible values that could be used for normalization include number of households or area.
The statistic employed by this tool uses permutations to determine how likely it would be to find the actual spatial distribution of the values that you are analyzing by comparing your values to a set of randomly generated values. Choosing the number of permutations in the Optimize for parameter is a balance between Precision and increased processing time (Speed). A lower number of permutations can be used when first exploring a problem, but it is best practice to increase the permutations to Precision for final results.
The Options drop-down menu can be used to set a specific Cell Size or Distance Band for your analysis.
The output layer will have additional fields containing information such as the Cluster/Outlier Type, the number of neighbors each feature had included in their analysis, and the Local Moran's I Index, Value and Score for each feature. The output layer also contains information on the statistical analysis in the Description section of its Item Details.
How Find Outliers works
Since our eyes and brains naturally try to find patterns even when none exist, it can be difficult to know if the patterns in your data are the result of real spatial processes at work or just the result of random chance. This is why researchers and analysts use statistical methods like Find Outliers (Anselin Local Moran's I) to quantify spatial patterns. When you do find statistically significant outliers or clustering in your data, you have valuable information. Knowing where and when outliers and clusters occur can provide important clues about the processes promoting the patterns you're seeing. Knowing that residential burglaries, for example, are consistently higher in particular neighborhoods is vital information if you need to design effective prevention strategies, allocate scarce police resources, initiate neighborhood watch programs, authorize in-depth criminal investigations, or identify potential suspects.
The Find Outliers tool calculates a local Moran's Index (LMiIndex) for each feature in the dataset. A positive value indicates that a feature has neighboring features with similarly high or low attribute values; this feature is part of a cluster. A negative value indicates that a feature has neighboring features with dissimilar values; this feature is an outlier. In either instance, the p-value for the feature must be small enough for the cluster or outlier to be considered statistically significant. For more information on determining statistical significance, see What is a z-score? What is a p-value?. Note that the local Moran's I index (I) is a relative measure and can only be interpreted within the context of its computed z-score or p-value. The Cluster/Outlier Type (COType) field distinguishes between a statistically significant cluster of high values (HH), a cluster of low values (LL), an outlier in which a high value is surrounded primarily by low values (HL), and an outlier in which a low value is surrounded primarily by high values (LH).
Analyze area features
Quite a lot of data is available for area features such as census tracts, counties, voter districts, hospital regions, parcels, park and recreation boundaries, watersheds, land cover classifications, and climate zones. When your analysis layer contains area features, you will need to specify a numeric field that will be used to find outliers of high and low values. This field might represent the following:
- Counts (such as the number of households)
- Rates (such as the proportion of the population holding a college degree)
- Averages (such as the mean or median household income)
- Indices (such as a score indicating whether household spending on sporting goods is above or below the national average)
With the field you provide, the Find Outliers tool will create a map (the result layer) showing you areas with statistically significant outliers of high values (red) and low values (blue) as well as clusters of high values (pink) and low values (light blue).
Analyze point features
A variety of data is available as point features. Examples of features most often represented as points include crime incidents, schools, hospitals, emergency call events, traffic accidents, water wells, trees, and boats. Sometimes you will be interested in analyzing data values (a field) associated with each point feature. In other cases, you will only be interested in evaluating the clustering or dispersion of the points themselves. The decision on whether or not to provide a field will depend on the question you are asking.
Find outliers of high and low values associated with point features
You will want to provide an analysis field to answer questions such as Where are there anomalous high and low values? The field you select might represent some of the following:
- Counts (such as the number of traffic accidents at street intersections)
- Rates (such as city unemployment, where each city is represented as a point feature)
- Averages (such as the mean math test score among schools)
- Indices (such as a consumer satisfaction score for car dealerships across the county)
Find outliers of high and low point counts
For some point data, typically when each point represents an event, incident, or indication of presence/absence, there won't be an obvious analysis field to use. In these cases, you just want to know where clustering is unusually (statistically significant) intense or sparse. For this analysis, area features (a fishnet grid or hexagon grid that the tool creates for you, or an area layer that you provide) are placed over the points, and the number of points that fall within each area are counted. The tool then finds outliers of high and low point counts associated with each area feature.
Define where points are possible
Specify an area layer, or draw areas defining a study area where you want analysis to be performed in all locations where the incident point features could possibly occur. For this option, the Find Outliers tool will overlay your defined study area with a fishnet (default) or hexagon grid and count the points falling within each grid cell. When you do not indicate where incident points are possible by using this option, the Find Outliers tool will only analyze grid cells that contain at least one point count. When you use this option to define where points are possible, however, the analysis will be done for all grid cells that fall within the bounding areas you define.
Count points within your own aggregation areas
In some cases, area features such as census tracts, police beats, or parcels will make more sense for your analysis than the default fishnet or hexagon grid.
Choose to divide by
There are two common approaches to identify outliers:
- By count—When you analyze a particular dataset, you often want to find outliers of the number of features in each aggregation area across your study area. For instance, you might want to find outliers where the highest numbers of crimes have happened in generally low crime areas or where the lowest numbers of crimes have occurred in high crime areas in order to maximize the effect of your allocated resources.
- By intensity—On the other hand, analyzing and understanding patterns that take into account underlying distributions that influence a particular phenomenon can also be meaningful. This concept is often referred to as normalization, or the process of dividing one numeric attribute value by another to minimize differences in values based on the size of areas or the number of features in each area. For instance, with crime, you might want to understand where there are outliers or clusters of high and low numbers of crimes that take into account the underlying population. In that case, you would count the number of crimes in each area (whether that area is a fishnet grid or a different area dataset) and divide that total number of crimes by the total population in that area. This would give you a crime rate, or the number of crimes per capita. Finding outlier areas of crime per capita answers a different question that can also help guide decision-making.
Both ways of analyzing the data in your study area are valid; it just depends on what question you are asking.
Choosing an appropriate attribute to divide by is very important. You need to make sure that the Divide By attribute is an attribute that does, in fact, influence the distribution of the particular phenomenon you are analyzing.
When you choose to Divide by Esri Population, the population data from the Esri Demographics Global Coverage is used. Be sure to look at the resolution of the data available for the area that you are interested in to make sure that it is compatible with the size of the areas that are being enriched (either aggregation areas you provide or fishnet squares being created).
The output from the Find Outliers tool is a map. For the points or the areas in this result layer map, those in dark red and dark blue indicate statistically significant outliers in your study area. Those in light blue and pink indicate statistically significant clustering. Points or areas displayed using beige, on the other hand, are not outliers or part of any statistically significant cluster; the spatial pattern associated with these features could very likely be the result of random chance. Sometimes the results of your analysis will indicate that there aren't any statistically significant outliers or clusters at all. This is important information to have. When a spatial pattern is random, you have no clues about underlying causes. In these cases, all of the features in the results layer will be beige. However, when you do find statistically significant outliers or clustering, those locations are important clues about what might be creating the phenomenon. For example, finding statistically significant spatial outliers of high cancer rates associated with certain environmental toxins can lead to policies and actions designed to protect people. Similarly, finding low outliers of childhood obesity associated with schools promoting after-school sports programs can provide strong justification for encouraging these types of programs more broadly.
The statistical method used by the Find Outliers tool is based on probability theory and, consequently, needs a minimum number of features to operate effectively. This statistical method also requires a variety of counts or analysis field values. If you are analyzing crime incidents by census tract, for example, and amazingly end up with exactly the same number of crimes in each tract, the tool cannot solve. The following table provides an explanation of the messages you may encounter when you use the Find Outliers tool:
The analysis options you selected require a minimum of 60 points to compute hot and cold spots.
There aren't enough point features in your point analysis layer to compute reliable results.
The obvious solution is to add more points to your analysis layer.
Alternatively, you can try defining bounding analysis areas, and thereby add information about where points could have occurred but didn't. With this method, you will need a minimum of 30 points.
You can also try providing aggregation areas that overlay your points. You will need a minimum of 30 polygon areas and 30 points within those areas for this analysis.
If you have at least 30 points, you may want to specify an analysis field. This changes the question from where are there many or few points to where do high and low analysis field values cluster spatially.
The analysis options you selected require a minimum of 30 points with valid data in the analysis field in order to compute hot and cold spots.
There aren't enough points, or enough points associated with non-NULL analysis field values, in your analysis layer to compute reliable results.
Unfortunately, if you have fewer than 30 points, this analysis method is not appropriate for your data. If you have more than 30 points and you are seeing this message, the analysis field you specified may have NULL values. Points with NULL analysis field values will be skipped. Another possibility is that you have an active Filter reducing the number of points available for analysis.
The analysis options you selected require a minimum of 30 polygons with valid data in the analysis field in order to compute hot and cold spots.
There aren't enough polygon areas, or enough area features associated with non-NULL analysis field values, in your analysis layer to compute reliable results.
Unfortunately, if you have fewer than 30 polygon areas, this analysis method is not appropriate for your data. If you have more than 30 areas and you are seeing this message, the analysis field you specified may have NULL values. Polygon areas with NULL analysis field values will be skipped. Another possibility is that you have an active Filter reducing the number of polygon areas available for analysis.
The analysis option you selected requires a minimum of 30 points to be inside the bounding polygon areas.
Only points that fall within the bounding analysis areas you draw or provide will be analyzed. To provide reliable results, at least 30 points should be inside the bounding analysis areas.
Unfortunately, if you do not have at least 30 points, this method is not appropriate for your data. With a minimum of 30 features, the solution here will often be to provide different, perhaps larger, bounding analysis areas.
Another option would be to provide an area layer with a minimum of 30 aggregation polygons that overlay at least 30 of your points. When you provide aggregation areas, analysis is performed on the point counts within each area.
The analysis option you selected requires a minimum of 30 points to be inside the aggregation polygons.
Only the points that fall inside the aggregation polygons will be included in the analysis. To provide reliable results, at least 30 points should be inside the polygon areas you provide.
Unfortunately, if you do not have at least 30 points, this method is not appropriate for your data; otherwise, you should draw or provide bounding analysis areas that overlay at least 30 of your points. The bounding areas should reflect all the locations where points could possibly occur.
The analysis option you selected requires a minimum of 30 aggregation areas.
The option you selected will overlay the aggregation areas on top of your points and count the number of points falling withing each area. A minimum of 30 counts (30 areas) are needed to provide reliable results.
Reliable results can be computed if you provide a minimum of 30 points that fall within a minimum of 30 aggregation areas. If you don't have 30 aggregation areas, you can try drawing or providing bounding analysis areas that overlay at least 30 of your points. These bounding areas should reflect all the locations where points could possibly occur.
Hot and cold spots cannot be computed when the number of points in every polygon area is identical. Try different polygon areas or different analysis options.
When the Find Hot Spots tool counted the number of points within each aggregation area, it found that the counts were all identical. To compute results, this tool requires at least some variation in the count values obtained.
You can provide alternative aggregation areas that will not result in all areas having the exact same number of points.
Rather than aggregation areas, you can also try drawing or providing bounding analysis areas.
Alternatively, you can specify an analysis field. However, this changes the question from where are there many or few points to where do high and low analysis field values cluster spatially.
There is not enough variation in point locations to compute hot and cold spots. Coincident points, for example, reduce spatial variation. You can try providing a bounding area, aggregation areas (a minimum of 30), or an Analysis Field.
Based on the number of points and how spread out they are, the tool creates a fishnet grid to overlay your points. After counting the number of points that fall within each fishnet square and removing squares with zero counts, there were fewer than 30 squares left. This tool requires a minimum of 30 counts (30 squares) to provide reliable results.
If your points occupy very few unique locations (if there are many coincident points), a good solution is to either provide aggregation areas that overlay your points, or draw and provide bounding analysis areas indicating where points are and are not possible.
Another option is to specify an analysis field. However, this changes the question from where are there many or few points to where do high and low analysis field values cluster spatially.
There is not enough variation among the points within the bounding polygon areas. You can try providing larger boundaries.
Based on point locations and number of points, the tool creates a fishnet grid to overlay your points. After counting the number of points that fall within each fishnet square and removing squares that are outside your bounding analysis areas, fewer than 30 fishnet squares were left. This tool requires a minimum of 30 counts (30 squares) to provide reliable results.
If your points are located at a variety of locations within the bounding analysis areas, you may just need to make or provide larger boundaries. If your points occupy very few unique locations (if there are many coincident points), a good solution is to provide aggregation areas that overlay your points.
Another option is to specify an analysis field. However, this changes the question from where are there many or few points to where do high and low analysis field values cluster spatially.
All of the values for your analysis field are likely the same. Hot and cold spots cannot be computed when there is no variation in the field being analyzed.
Most likely you specified an analysis field that has the same value for all of your points or area features in the analysis layer. The statistic used by this tool cannot solve unless there is a variety of values to work with.
We were not able to compute hot and cold spots for the data provided. If appropriate, try specifying an Analysis Field.
While quite unlikely, when the tool created a fishnet grid and counted the number of points within each square, the counts for all squares were identical.
Cell Size should be less than Distance Band.
You have provided a Distance Band value that is smaller than the size of each grid cell.
Check the units specified for both Distance Band and Cell Size, use the default value calculated by the tool, or use a value that is larger than the size of a single grid cell.
Additional information about the algorithms used by the Find Outliers tool can be found in How Optimized Outlier Analysis works.
Use Find Outliers to determine if there are any statistically significant outliers in the spatial pattern of your data. Other tools that may be useful are described below.
Map viewer analysis tools
If you are interested in finding statistically significant clusters of high and low values in the spatial pattern of your data, use the Find Hot Spots tool.
If you are using point or line measurements to create a density map, use the Calculate Density tool.
ArcGIS Desktop analysis tools
Find Outliers executes the same statistic used in the Cluster and Outlier Analysis (Anselin Local Moran's I) and Optimized Outlier Analysis tools.