The Find Outliers tool identifies statistically significant hot spots, cold spots, and spatial outliers using the Anselin Local Moran's I statistic.
Example
A police precinct wants to identify the areas in its precinct with consistently higher numbers of burglaries. The precinct uses the Find Outliers tool to identify the streets that are hot spots and outliers with high values. Police officers use the results to design prevention strategies, allocate their sparse resources, and initiate neighborhood watch programs.
Usage notes
The Find Outliers tool includes configurations for input features, outlier settings, and the result layer.
Input features
The Input features group includes the Input layer parameter, which is the point or polygon layer on which cluster and outlier analysis will be performed.
For feature inputs, a count of features is displayed below the layer name. The count includes all features in the layer, except features that have been removed using a filter. Environment settings, such as Processing extent, are not reflected in the feature count.
Note:
Web Mercator is not an appropriate projection for spatial analysis. If the spatial reference system of the input layer is WGS 1984 Web Mercator (Auxiliary Sphere), the data will be converted to a geographic coordinate system to use chordal distances in the analysis.
Outlier settings
The Outlier settings group includes the following parameters:
- Variable type determines whether analysis is performed on the feature counts or values. The options are as follows:
- Field—Analysis will be applied to the values of the field specified by Analysis field.
- Point counts—Point features will be aggregated into polygons or cells and counted. Analysis will be applied to the aggregated point counts. This option is available when the input layer is point features.
- Aggregation shape type specifies the shape of the cells within which the point features will be aggregated. This parameter is available when Point counts is specified for Variable type. The following shape options are available:
- Fishnet cells—Point features will be aggregated within fishnet (square) cells.
- Hexagon cells—Point features will be aggregated within hexagon cells.
- Polygon layer—Point features will be aggregated within the polygon features specified by Aggregation polygon layer.
- Aggregation polygon layer is the layer that contains the polygon features within which the points will be aggregated. This parameter is available when Polygon layer is specified for Aggregation shape type.
- Define where points are possible is the layer that will define the extent of the analysis. You can choose a layer using the Layer button or use the Draw input features button to create a sketch layer to use as the input. Points that fall outside of the bounds of the layer will not be included in the analysis. This parameter is available when either Fishnet cells or Hexagon cells is specified for Aggregation shape type.
- Analysis field specifies the field that will be analyzed to determine outliers. This parameter is available when Field is specified for Variable type.
- Divide by specifies how the values are divided in the analysis field or the aggregated point counts. The options are as follows:
- Field—The field in the input layer that will be used to divide the analysis field values.
- Enrichment data—The features or aggregation shape will be enriched with Esri population data; then the analysis field values or the aggregated point counts will be divided by the population if the Esri Population is specified. The source of the Esri population data is Esri Demographics Global Coverage. This option uses ArcGIS GeoEnrichment Service and consumes additional credits.
- Optimization option specifies whether the number of permutations will be selected to optimize the performance of the tool (Speed), the precision of the pseudo p-value (Precision), or both (Balance). The features in a target feature's neighborhood will be permuted to evaluate the observed Local Moran's I value and to determine the likelihood of finding the observed spatial distribution around a target feature. A permutation will randomly rearrange the features in a target feature's neighborhood, and calculate a Local Moran's I value. Several permutations will result in a distribution of Local Moran's I values for a target feature. The pseudo p-value is then calculated by comparing the observed Local Moran's I value to the distribution of Local Moran's I values. The following optimization options are available:
- Speed—Runs 199 permutations to optimize the speed at which the tool runs. The smallest possible pseudo p-value is 0.005.
- Balance—Runs 499 permutations to optimize both speed and precision. The smallest possible pseudo p-value is 0.002.
- Precision—Runs 999 permutations to optimize the precision of the pseudo p-value. The smallest possible pseudo p-value is 0.001.
- Random number seed is an integer value that initiates a random number generator. The random number generator will be used to permute the features in each target feature's neighborhood before calculating a Local Moran's I value.
- Cell size is a numeric value that defines the length of a side of each cell.
- Cell size unit is the units that will be used for the cell size. Supported units are feet, miles, meters, and kilometers.
- Distance band is a numeric value that defines the distance from a target feature that will be included in a target feature's neighborhood. All of the features that fall within the distance band will be included in the target feature's neighborhood. The entire neighborhood will be used to determine whether the target feature is part of a cluster with high or low values and whether the feature is an outlier.
- Distance band unit is the units of the distance band. Supported units are feet, miles, meters, and kilometers.
Result layer
The Result layer group includes the following parameters:
- Output name specifies the name of the layer that is created and displayed. The name must be unique. If a layer with the same name already exists in your organization, the tool will fail and you will be prompted to use a different name.
- Save in folder specifies the name of a folder in My content where the result will be saved.
Limitations
The following limitations apply to the tool:
- If Variable type is specified as Point counts, the following limitations apply:
- The input layer must contain at least 60 point features.
- At a minimum, 30 aggregation cells or polygons must contain at least one point feature.
- The point counts within the aggregation cells or polygons cannot be identical. There must be variation in the point counts between aggregation cells or polygons.
- If Variable type is specified as Analysis field, the following limitations apply:
- At a minimum, 30 features must contain non-null values in the specified analysis field.
- The values in the specified analysis field cannot be identical. There must be variation in the values.
- At a minimum, 30 points must fall within the bounding area specified by Define where points are possible.
- The cell size value cannot exceed the distance band.
- The availability of Esri population data will depend on the location of the input features.
- Esri population data is not available for the Divide by parameter when your organization has a custom GeoEnrichment Service configured.
Environments
Analysis environment settings are additional parameters that affect a tool's results. You can access the tool's analysis environment settings from the Environment settings parameter group.
This tool honors the following analysis environments:
- Output coordinate system
- Processing extent
Note:
The default processing extent is Full extent. This default is different from Map Viewer Classic in which Use current map extent is enabled by default.
Credits
Credits will be consumed if your ArcGIS Enterprise portal is configured to use ArcGIS GeoEnrichment Service and Esri Population is chosen for Divide by.
For more information, see Understand credits for spatial analysis.
Outputs
The tool outputs a layer with the results of the cluster and outlier analysis. The layer includes fields for the count, cluster-outlier type, Local Moran's I value, p-value, z-score, number of neighbors, spatial lag, and z-transform of each feature. The cluster-outlier type field distinguishes between a statistically significant cluster of high values (HH), a cluster of low values (LL), a high value outlier surrounded by low values (HL), a low value outlier surround by high values (LH), and a nonsignificant result (NS). The Local Moran's I value indicates whether the feature and its neighborhood have similar (positive) or dissimilar (negative) values. Outliers will have a negative Local Moran's Index.
To view additional details about the analysis, click Analysis on the Settings toolbar. Click History, and find and click the successful tool run. The analysis details will open on the Results tab. The Results tab includes additional details about the analysis. You can also view the additional details on the layer's item page. Click the options button next to the output layer and click View details.
Licensing requirements
This tool requires the following user type and configurations:
- Creator, Professional, or Professional Plus user type
- Publisher or Administrator role, or an equivalent custom role
The following privileges and services are required to use Esri population data:
- GeoEnrichment privilege
- ArcGIS GeoEnrichment Service or a custom GeoEnrichment service
Resources
Use the following resources to learn more:
- Optimized Outlier Analysis in ArcGIS Pro
- Cluster and Outlier Analysis (Anselin Local Moran's I) in ArcGIS Pro
- Find Outliers in ArcGIS REST API
- find_outliers in ArcGIS API for Python