The Find Outliers tool will determine if there are any statistically significant outliers in the spatial pattern of your data.
- Are there anomalous spending patterns in Los Angeles?
- Where are the sharpest boundaries between affluence and poverty in the study area?
- In your area, are there retail stores that are struggling with low sales despite being surrounded by high-performing stores?
- Where are there unexpectedly high rates of diabetes across the study area?
- Are there counties in the United States with unusually low life expectancy compared to their neighboring counties?
The input features can be points or areas.
The Find outliers of parameter is used to evaluate the spatial arrangement of features. If your features are areas, a field must be chosen. Outliers will be determined using the numbers in the chosen field. Point features can be analyzed using a field or the Point Counts option. If Point Counts is used, the tool will determine if the points themselves are unusually dispersed or clustered, rather than high and low field values.
If points are being analyzed with Point Counts, two additional options will be available. The Count points within parameter allows the points to be aggregated within a Fishnet Grid, Hexagon Grid, or an area layer from the Contents pane, such as counties or ZIP Codes. The Define where points are possible parameter is used to create an area or multiple areas of interest. The three options for this parameter are None, meaning all points are used, an area defined by an area layer from the Contents pane, and areas created using the Draw tool.
Your data can be normalized using the Divide by parameter. The Esri Population data uses GeoEnrichment and requires the use of credits. Another option is to normalize using a field from the input layer (available when the Find outliers of parameter is set to a field, rather than Point Counts). Values that can be used for normalization include number of households or area.
Esri Population data is not available for the Divide by parameter when your organization has a custom GeoEnrichment service configured.
The statistic used by this tool uses permutations to determine how likely it would be to find the actual spatial distribution of the values that you are analyzing by comparing your values to a set of randomly generated values. Choosing the number of permutations in the Optimize for parameter is a balance between the Precision option and increased processing time (the Speed option). A lower number of permutations can be used when first exploring a problem, but it is a best practice to increase the permutations to the Precision option for final results.
The Options drop-down menu can be used to set a specific Cell Size value or Distance Band value for your analysis.
The output layer includes additional fields containing information such as the Cluster/Outlier Type, the number of neighbors each feature included in their analysis, the Local Moran's I Index, and the Value and Score for each feature. The output layer also contains information about the statistical analysis in the Description section of its Item Details page.
How Find Outliers works
Since our eyes and brains naturally try to find patterns even when none exist, it can be difficult to know if the patterns in your data are the result of spatial processes at work or just the result of random chance. This is why researchers and analysts use statistical methods such as Find Outliers (Anselin Local Moran's I) to quantify spatial patterns.
When you do find statistically significant outliers or clustering in your data, you have valuable information. Knowing where and when outliers and clusters occur can provide important clues about the processes promoting the patterns you're seeing. Knowing that residential burglaries, for example, are consistently higher in particular neighborhoods is vital information if you need to design effective prevention strategies, allocate scarce police resources, initiate neighborhood watch programs, authorize in-depth criminal investigations, or identify potential suspects.
The Find Outliers tool calculates a local Moran's Index (LMiIndex) for each feature in the dataset. A positive value indicates that a feature has neighboring features with similarly high or low attribute values; this feature is part of a cluster. A negative value indicates that a feature has neighboring features with dissimilar values; this feature is an outlier. In either instance, the p-value for the feature must be small enough for the cluster or outlier to be considered statistically significant. For more information about determining statistical significance, see What is a z-score? What is a p-value?. The local Moran's I index (I) is a relative measure and can only be interpreted in the context of its computed z-score or p-value. The Cluster/Outlier Type (COType) field distinguishes between a statistically significant cluster of high values (HH), a cluster of low values (LL), an outlier in which a high value is surrounded primarily by low values (HL), and an outlier in which a low value is surrounded primarily by high values (LH).
Analyze area features
Data is available for area features such as census tracts, counties, voter districts, hospital regions, parcels, park and recreation boundaries, watersheds, land cover classifications, and climate zones. When your analysis layer contains area features, you must specify a numeric field that will be used to find outliers of high and low values. This field can represent the following:
- Counts (such as the number of households)
- Rates (such as the proportion of the population holding a college degree)
- Averages (such as the mean or median household income)
- Indices (such as a score indicating whether household spending on sporting goods is above or below the national average)
With the field you provide, the Find Outliers tool will create a map (the result layer) showing the areas with statistically significant outliers of high values (red) and low values (blue) as well as clusters of high values (pink) and low values (light blue).
Analyze point features
A variety of data is available as point features. Examples of features most often represented as points include crime incidents, schools, hospitals, emergency call events, traffic accidents, water wells, trees, and boats. Sometimes you will be interested in analyzing data values (a field) associated with each point feature. In other cases, you will only be interested in evaluating the clustering or dispersion of the points. The decision on whether to provide a field will depend on the question you are asking.
Find outliers of high and low values associated with point features
Provide an analysis field to answer questions such as Where are there anomalous high and low values? The field you select can represent the following:
- Counts (such as the number of traffic accidents at street intersections)
- Rates (such as city unemployment, where each city is represented as a point feature)
- Averages (such as the mean math test score among schools)
- Indices (such as a consumer satisfaction score for car dealerships across the county)
Find outliers of high and low point counts
For some point data—typically when each point represents an event, incident, or indication of presence or absence—there won't be an obvious analysis field to use. In these cases, you can find where clustering is unusually (statistically significant) intense or sparse. For this analysis, area features (a fishnet grid or hexagon grid that the tool creates, or an area layer that you provide) are placed over the points and the number of points that fall within each area are counted. The tool then finds outliers of high and low point counts associated with each area feature.
Define where points are possible
Specify an area layer, or draw areas defining a study area where you want analysis to be performed in all locations where the incident point features could possibly occur. For this option, the Find Outliers tool will overlay your defined study area with a fishnet (default) or hexagon grid and counts the points falling within each grid cell. When you do not indicate where incident points are possible using this option, the Find Outliers tool will only analyze grid cells that contain at least one point count. When you use this option to define where points are possible, however, the analysis will be done for all grid cells that fall within the bounding areas you define.
Count points within aggregation areas
In some cases, area features such as census tracts, police beats, or parcels make more sense for your analysis than the default fishnet or hexagon grid.
Choose to divide by
There are two common approaches to identify outliers:
- By count—When you analyze a particular dataset, you often want to find outliers of the number of features in each aggregation area across your study area. For instance, you can find outliers where the highest numbers of crimes have occurred in generally low crime areas or where the lowest numbers of crimes have occurred in high crime areas to maximize the effect of your allocated resources.
- By intensity—On the other hand, analyzing and understanding patterns that take into account underlying distributions that influence a particular phenomenon can also be meaningful. This concept is often referred to as normalization, or the process of dividing one numeric attribute value by another to minimize differences in values based on the size of areas or the number of features in each area. For instance, with crime, you may want to understand where there are outliers or clusters of high and low numbers of crimes that take into account the underlying population. In that case, you can count the number of crimes in each area (whether that area is a fishnet grid or a different area dataset) and divide that total number of crimes by the total population in that area. This gives you a crime rate, or the number of crimes per capita. Finding outlier areas of crime per capita answers a different question that can also help guide decision making.
Both ways of analyzing the data in your study area are valid; it just depends on what question you are asking.
Choosing an appropriate attribute to divide by is important. You must confirm that the Divide by parameter is a parameter that does, in fact, influence the distribution of the particular phenomenon you are analyzing.
When you choose the Divide by parameter for Esri Population, the population data from the Esri Demographics Global Coverage is used. Confirm that the resolution of the data available for the area that you are interested in is compatible with the size of the areas that are being enriched (either aggregation areas you provide or fishnet squares being created).
The output from the Find Outliers tool is a map. For the points or the areas in this result layer map, those in dark red and dark blue indicate statistically significant outliers in your study area. Those in light blue and pink indicate statistically significant clustering. Points or areas displayed using beige, on the other hand, are not outliers or part of any statistically significant cluster; the spatial pattern associated with these features may be the result of random chance. Sometimes the results of your analysis will indicate that there are no statistically significant outliers or clusters at all. This is important information. When a spatial pattern is random, you have no clues about underlying causes. In these cases, all of the features in the results layer will be beige. However, when you do find statistically significant outliers or clustering, those locations are important clues about what may be creating the phenomenon. For example, finding statistically significant spatial outliers of high cancer rates associated with certain environmental toxins can lead to policies and actions designed to protect people. Similarly, finding low outliers of childhood obesity associated with schools promoting after-school sports programs can provide strong justification for encouraging these types of programs more broadly.
The statistical method used by the Find Outliers tool is based on probability theory and, consequently, needs a minimum number of features to operate effectively. This statistical method also requires a variety of counts or analysis field values. If you are analyzing crime incidents by census tract, for example, and end up with exactly the same number of crimes in each tract, the tool cannot solve. The following table provides an explanation of the messages you may encounter when you use the Find Outliers tool:
The analysis options you selected require a minimum of 60 points to compute hot and cold spots.
There aren't enough point features in your point analysis layer to compute reliable results.
Add more points to your analysis layer.
Alternatively, you can define bounding analysis areas to add information about where points could have occurred but didn't. With this method, you need a minimum of 30 points.
You can also provide aggregation areas that overlay your points. You need a minimum of 30 polygon areas and 30 points within those areas for this analysis.
If you have at least 30 points, you can specify an analysis field. This changes the question from where are there many or few points to where do high and low analysis field values cluster spatially.
The analysis options you selected require a minimum of 30 points with valid data in the analysis field in order to compute hot and cold spots.
There aren't enough points, or enough points associated with non-null analysis field values, in your analysis layer to compute reliable results.
If you have fewer than 30 points, this analysis method is not appropriate for your data. If you have more than 30 points and you are seeing this message, the analysis field you specified may have null values. Points with null analysis field values are skipped. Another possibility is that you have an active filter reducing the number of points available for analysis.
The analysis options you selected require a minimum of 30 polygons with valid data in the analysis field in order to compute hot and cold spots.
There aren't enough polygon areas, or enough area features associated with non-null analysis field values, in your analysis layer to compute reliable results.
If you have fewer than 30 polygon areas, this analysis method is not appropriate for your data. If you have more than 30 areas and you are seeing this message, the analysis field you specified may have null values. Polygon areas with null analysis field values are skipped. Another possibility is that you have an active filter reducing the number of polygon areas available for analysis.
The analysis option you selected requires a minimum of 30 points to be inside the bounding polygon areas.
Only points that fall within the bounding analysis areas you draw or provide are analyzed. To provide reliable results, at least 30 points should be inside the bounding analysis areas.
If you do not have at least 30 points, this method is not appropriate for your data. With a minimum of 30 features, the solution is often to provide different, perhaps larger, bounding analysis areas.
Another option is to provide an area layer with a minimum of 30 aggregation polygons that overlay at least 30 of your points. When you provide aggregation areas, analysis is performed on the point counts within each area.
The analysis option you selected requires a minimum of 30 points to be inside the aggregation polygons.
Only the points that fall inside the aggregation polygons are included in the analysis. To provide reliable results, at least 30 points should be inside the polygon areas you provide.
If you do not have at least 30 points, this method is not appropriate for your data; otherwise, you should draw or provide bounding analysis areas that overlay at least 30 of your points. The bounding areas should reflect all the locations where points could possibly occur.
The analysis option you selected requires a minimum of 30 aggregation areas.
The option you selected overlays the aggregation areas on top of your points and counts the number of points falling withing each area. A minimum of 30 counts (30 areas) are needed to provide reliable results.
Reliable results can be computed if you provide a minimum of 30 points that fall within a minimum of 30 aggregation areas. If you don't have 30 aggregation areas, you can draw or provide bounding analysis areas that overlay at least 30 of your points. These bounding areas should reflect all the locations where points could possibly occur.
Hot and cold spots cannot be computed when the number of points in every polygon area is identical. Try different polygon areas or different analysis options.
When the Find Hot Spots tool counted the number of points within each aggregation area, it found that the counts were all identical. To compute results, this tool requires at least some variation in the count values obtained.
You can provide alternative aggregation areas that do not result in all areas having the exact same number of points.
Rather than aggregation areas, you can also draw or provide bounding analysis areas.
Alternatively, you can specify an analysis field. However, this changes the question from where are there many or few points to where do high and low analysis field values cluster spatially.
There is not enough variation in point locations to compute hot and cold spots. Coincident points, for example, reduce spatial variation. You can try providing a bounding area, aggregation areas (a minimum of 30), or an Analysis Field.
Based on the number of points and how spread out they are, the tool creates a fishnet grid to overlay your points. After counting the number of points that fall within each fishnet square and removing squares with zero counts, there were fewer than 30 squares left. This tool requires a minimum of 30 counts (30 squares) to provide reliable results.
If your points occupy few unique locations (if there are many coincident points), a solution is to either provide aggregation areas that overlay your points or draw or provide bounding analysis areas indicating where points are and are not possible.
Another option is to specify an analysis field. However, this changes the question from where are there many or few points to where do high and low analysis field values cluster spatially.
There is not enough variation among the points within the bounding polygon areas. You can try providing larger boundaries.
Based on point locations and number of points, the tool creates a fishnet grid to overlay your points. After counting the number of points that fall within each fishnet square and removing squares that are outside the bounding analysis areas, fewer than 30 fishnet squares were left. This tool requires a minimum of 30 counts (30 squares) to provide reliable results.
If your points are located at a variety of locations within the bounding analysis areas, you may just need to make or provide larger boundaries. If your points occupy few unique locations (if there are many coincident points), a solution is to provide aggregation areas that overlay your points.
Another option is to specify an analysis field. However, this changes the question from where are there many or few points to where do high and low analysis field values cluster spatially.
All of the values for your analysis field are likely the same. Hot and cold spots cannot be computed when there is no variation in the field being analyzed.
Most likely you specified an analysis field that has the same value for all of your points or area features in the analysis layer. The statistic used by this tool cannot solve unless there is a variety of values to work with.
We were not able to compute hot and cold spots for the data provided. If appropriate, try specifying an Analysis Field.
While unlikely, when the tool created a fishnet grid and counted the number of points within each square, the counts for all squares were identical.
Cell Size should be less than Distance Band.
You provided a Distance Band value that is smaller than the size of each grid cell.
Review the units specified for both Distance Band and Cell Size, use the default value calculated by the tool, or use a value that is larger than the size of a single grid cell.
Additional information about the algorithms used by the Find Outliers tool can be found in How Optimized Outlier Analysis works.
Use Find Outliers to determine if there are any statistically significant outliers in the spatial pattern of your data. Other tools that may be useful are described below.
Map Viewer analysis tools
To find statistically significant clusters of high and low values in the spatial pattern of your data, use the Find Hot Spots tool.
To use point or line measurements to create a density map, use the Calculate Density tool.
ArcGIS Pro analysis tools
Find Outliers executes the same statistic used in the Cluster and Outlier Analysis (Anselin Local Moran's I) and Optimized Outlier Analysis tools.