The Find Hot Spots tool will determine if there is any statistically significant clustering in the spatial pattern of your data.
Workflow diagram
Examples
A city's police department is conducting an analysis to determine if there is a relationship between violent crimes and unemployment rates. An expanded summer job program will be implemented for high schools in areas where there is high violent crime and high unemployment. Find Hot Spots will be used to find areas with statistically significant crime and unemployment hot spots.
A political strategist wants to know which regions showed the strongest or weakest support for a particular political party in the last election. This information could be helpful in guiding campaign strategies for future elections. The strategist subtracts the proportion of Democrat votes from the proportion of Republican votes and uses Find Hot Spots to find the hot and cold spots in the differences. The hot spots (red) will denote strong Republican support, while the cold spots (blue) will denote strong Democrat support.
A conservation officer is studying disease in trees to prioritize which areas of the forest should receive treatment and learn more about areas that are showing some resistence. The Find Hot Spots tool can be used to find clusters of diseased (hot spots) and healthy (cold spots) trees.
Usage notes
The input features may be points or areas.
The Find clusters of high and low parameter is used to evaluate the spatial arrangement of your features. If your features are areas, a field must be chosen. Clustering will be determined using the numbers in the chosen field. Point features can be analyzed using a field or the Point Counts option. If Point Counts is used, the tool will determine if the points themselves are clustered, rather than clusters of high and low field values.
If points are being analyzed with Point Counts, two new options will be available. The Count points within parameter allows the points to be aggregated within a Fishnet Grid, Hexagon Grid, or an area layer from your Contents, such as counties or ZIP Codes. The Define where points are possible parameter is used to create an area or multiple areas of interest. The three options for this parameter are None, meaning all points are used, an area defined by an area layer from your Contents, and areas created using the Draw tool.
Your data can be normalized using the Divide by parameter. The Esri Population data uses GeoEnrichment and requires the use of credits. Another option is to normalize using a field from the input layer. Some possible values that could be used for normalization include number of households or area.
The Options can be used to set a specific Cell Size or Distance Band for your analysis.
The output layer will have additional fields containing information such as the statistical significance of each feature, the p-value, and the z-score. The output layer also contains information on the statistical analysis in the Description section of its Item Details.
How Find Hot Spots works
Even random spatial patterns exhibit some degree of clustering. In addition, our eyes and brains naturally try to find patterns even when none exist. Consequently, it can be difficult to know if the patterns in your data are the result of real spatial processes at work or just the result of random chance. This is why researchers and analysts use statistical methods like Find Hot Spots (Getis-Ord Gi*) to quantify spatial patterns.
The tool calculates the Getis-Ord Gi* statistic (pronounced G-i-star) for each feature in a dataset. The resultant z-scores and p-values tell you where features with either high or low values cluster spatially. The Find Hot Spots tool calculates optimal defaults based on the characteristics of the input data and automatically applies a False Discovery Rate (FDR) correction. Each feature is analyzed within the context of neighboring features. A feature with a high value is interesting but may not be a statistically significant hot spot. To be a statistically significant hot spot, a feature will have a high value and be surrounded by other features with high values as well. The local sum for a feature and its neighbors is compared proportionally to the sum of all features; when the local sum is very different from the expected local sum, and when that difference is too large to be the result of random chance, a statistically significant z-score results.
When you do find statistically significant clustering in your data, you have valuable information. Knowing where and when clustering occurs can provide important clues about the processes promoting the patterns you're seeing. Knowing that residential burglaries, for example, are consistently higher in particular neighborhoods is vital information if you need to design effective prevention strategies, allocate scarce police resources, initiate neighborhood watch programs, authorize in-depth criminal investigations, or identify potential suspects.
Analyze area features
Quite a lot of data is available for area features such as census tracts, counties, voter districts, hospital regions, parcels, park and recreation boundaries, watersheds, land cover classifications and climate zones. When your analysis layer contains area features, you will need to specify a numeric field that will be used to find clusters of high and low values. This field might represent the following:
- Counts (such as the number of households)
- Rates (such as the proportion of the population holding a college degree)
- Averages (such as the mean or median household income)
- Indices (such as a score indicating whether household spending on sporting goods is above or below the national average)
With the field you provide, the Find Hot Spots tool will create a map (the result layer) showing you areas with statistically significant clusters of high values (hot spots: red) and low values (cold spots: blue).
Analyze point features
A variety of data is available as point features. Examples of features most often represented as points include crime incidents, schools, hospitals, emergency call events, traffic accidents, water wells, trees, and boats. Sometimes you will be interested in analyzing data values (a field) associated with each point feature. In other cases, you will only be interested in evaluating the clustering of the points themselves. The decision on whether or not to provide a field will depend on the question you are asking.
Find clusters of high and low values associated with point features
You will want to provide an analysis field to answer questions such as Where do high and low values cluster? The field you select might represent some of the following:
- Counts (such as the number of traffic accidents at street intersections)
- Rates (such as city unemployment, where each city is represented as a point feature)
- Averages (such as the mean math test score among schools)
- Indices (such as a consumer satisfaction score for car dealerships across the county)
Find clusters of high and low point counts
For some point data, typically when each point represents an event, incident, or indication of presence/absence, there won't be an obvious analysis field to use. In these cases, you just want to know where clustering is unusually (statistically significant) intense or sparse. For this analysis, area features (a fishnet grid that the tool creates for you, or an area layer that you provide) are placed over the points and the number of points that fall within each area are counted. The tool then finds clusters of high and low point counts associated with each area feature.
Define where points are possible
Specify an area layer, or draw areas defining a study area where you want analysis to be performed in all locations where the incident point features could possibly occur. For this option, the Find Hot Spots tool will overlay your defined study area with a fishnet grid and count the points falling within each fishnet square. When you do not indicate where incident points are possible by using this option, the Find Hot Spots tool will only analyze fishnet squares that contain at least one point count. When you use this option to define where points are possible, however, the analysis will be done for all fishnet squares that fall within the bounding areas you define.
Count points within your own aggregation areas
In some cases, area features such as census tracts, police beats, or parcels will make more sense for your analysis than the default fishnet grid.
Choose to divide by
There are two common approaches to identify hot and cold spots:
- By count—When you analyze a particular dataset, you often want to find hot and cold spots of the number of features in each aggregation area across your study area. For instance, you might want to find hot spots where the highest numbers of crimes have happened and cold spots where the lowest numbers of crimes have occurred in order to allocate resources.
- By intensity—On the other hand, analyzing and understanding patterns that take into account underlying distributions that influence a particular phenomenon can also be meaningful. This concept is often referred to as normalization, or the process of dividing one numeric attribute value by another to minimize differences in values based on the size of areas or the number of features in each area. For instance, with crime, you might want to understand where there are clusters of high and low numbers of crimes that take into account the underlying population. In that case, you would count the number of crimes in each area (whether that area is a fishnet grid or a different area dataset) and divide that total number of crimes by the total population in that area. This would give you a crime rate, or the number of crimes per capita. Finding hot and cold spots of crime per capita answers a different question that can also help guide decision-making.
Both ways of analyzing the data in your study area are valid; it just depends on what question you are asking.
Choosing an appropriate attribute to divide by is very important. You need to make sure that the Divide By attribute is an attribute that does, in fact, influence the distribution of the particular phenomenon you are analyzing.
When you choose to Divide by Esri Population, the population data from the Esri Demographics Global Coverage is used. Be sure to look at the resolution of the data available for the area that you are interested in to make sure that it is compatible with the size of the areas that are being enriched (either aggregation areas you provide or fishnet squares being created).
Interpret results
The output from the Find Hot Spots tool is a map. For the points or the areas in this result layer map, the darker the red or blue colors appear, the more confident you can be that clustering is not the result of random chance. Points or areas displayed using beige, on the other hand, are not part of any statistically significant cluster; the spatial pattern associated with these features could very likely be the result of random chance. Sometimes the results of your analysis will indicate that there aren't any statistically significant clusters at all. This is important information to have. When a spatial pattern is random, you have no clues about underlying causes. In these cases, all of the features in the results layer will be beige. However, when you do find statistically significant clustering, the locations where clustering occurs are important clues about what might be creating the clustering. For example, finding statistically significant spatial clustering of cancer associated with certain environmental toxins can lead to policies and actions designed to protect people. Similarly, finding cold spots of childhood obesity associated with schools promoting after-school sports programs can provide strong justification for encouraging these types of programs more broadly.
Troubleshoot
The statistical method used by the Find Hot Spots tool is based on probability theory and, consequently, needs a minimum number of features to operate effectively. This statistical method also requires a variety of counts or analysis field values. If you are analyzing crime incidents by census tract, for example, and amazingly end up with exactly the same number of crimes in each tract, the tool cannot solve. The following table provides an explanation of the messages you may encounter when you use the Find Hot Spots tool:
Message | Problem | Solution |
---|---|---|
The analysis options you selected require a minimum of 60 points to compute hot and cold spots. | There aren't enough point features in your point analysis layer to compute reliable results. | The obvious solution is to add more points to your analysis layer. Alternatively, you can try defining bounding analysis areas, and thereby add information about where points could have occurred but didn't. With this method, you will need a minimum of 30 points. You can also try providing aggregation areas that overlay your points. You will need a minimum of 30 polygon areas and 30 points within those areas for this analysis. If you have at least 30 points, you may want to specify an analysis field. This changes the question from where are there many or few points to where do high and low analysis field values cluster spatially. |
The analysis options you selected require a minimum of 30 points with valid data in the analysis field in order to compute hot and cold spots. | There aren't enough points, or enough points associated with non-NULL analysis field values, in your analysis layer to compute reliable results. | Unfortunately, if you have fewer than 30 points, this analysis method is not appropriate for your data. If you have more than 30 points and you are seeing this message, the analysis field you specified may have NULL values. Points with NULL analysis field values will be skipped. Another possibility is that you have an active Filter reducing the number of points available for analysis. |
The analysis options you selected require a minimum of 30 polygons with valid data in the analysis field in order to compute hot and cold spots. | There aren't enough polygon areas, or enough area features associated with non-NULL analysis field values, in your analysis layer to compute reliable results. | Unfortunately, if you have fewer than 30 polygon areas, this analysis method is not appropriate for your data. If you have more than 30 areas and you are seeing this message, the analysis field you specified may have NULL values. Polygon areas with NULL analysis field values will be skipped. Another possibility is that you have an active Filter reducing the number of polygon areas available for analysis. |
The analysis option you selected requires a minimum of 30 points to be inside the bounding polygon areas. | Only points that fall within the bounding analysis areas you draw or provide will be analyzed. To provide reliable results, at least 30 points should be inside the bounding analysis areas. | Unfortunately, if you do not have at least 30 points, this method is not appropriate for your data. With a minimum of 30 features, the solution here will often be to provide different, perhaps larger, bounding analysis areas. Another option would be to provide an area layer with a minimum of 30 aggregation polygons that overlay at least 30 of your points. When you provide aggregation areas, analysis is performed on the point counts within each area. |
The analysis option you selected requires a minimum of 30 points to be inside the aggregation polygons. | Only the points that fall inside the aggregation polygons will be included in the analysis. To provide reliable results, at least 30 points should be inside the polygon areas you provide. | Unfortunately, if you do not have at least 30 points, this method is not appropriate for your data; otherwise, you should draw or provide bounding analysis areas that overlay at least 30 of your points. The bounding areas should reflect all the locations where points could possibly occur. |
The analysis option you selected requires a minimum of 30 aggregation areas. | The option you selected will overlay the aggregation areas on top of your points and count the number of points falling withing each area. A minimum of 30 counts (30 areas) are needed to provide reliable results. | Reliable results can be computed if you provide a minimum of 30 points that fall within a minimum of 30 aggregation areas. If you don't have 30 aggregation areas, you can try drawing or providing bounding analysis areas that overlay at least 30 of your points. These bounding areas should reflect all the locations where points could possibly occur. |
Hot and cold spots cannot be computed when the number of points in every polygon area is identical. Try different polygon areas or different analysis options. | When the Find Hot Spots tool counted the number of points within each aggregation area, it found that the counts were all identical. To compute results, this tool requires at least some variation in the count values obtained. | You can provide alternative aggregation areas that will not result in all areas having the exact same number of points. Rather than aggregation areas, you can also try drawing or providing bounding analysis areas. Alternatively, you can specify an analysis field. However, this changes the question from where are there many or few points to where do high and low analysis field values cluster spatially. |
There is not enough variation in point locations to compute hot and cold spots. Coincident points, for example, reduce spatial variation. You can try providing a bounding area, aggregation areas (a minimum of 30), or an Analysis Field. | Based on the number of points and how spread out they are, the tool creates a fishnet grid to overlay your points. After counting the number of points that fall within each fishnet square and removing squares with zero counts, there were fewer than 30 squares left. This tool requires a minimum of 30 counts (30 squares) to provide reliable results. | If your points occupy very few unique locations (if there are many coincident points), a good solution is to either provide aggregation areas that overlay your points, or draw and provide bounding analysis areas indicating where points are and are not possible. Another option is to specify an analysis field. However, this changes the question from where are there many or few points to where do high and low analysis field values cluster spatially. |
There is not enough variation among the points within the bounding polygon areas. You can try providing larger boundaries. | Based on point locations and number of points, the tool creates a fishnet grid to overlay your points. After counting the number of points that fall within each fishnet square and removing squares that are outside your bounding analysis areas, fewer than 30 fishnet squares were left. This tool requires a minimum of 30 counts (30 squares) to provide reliable results. | If your points are located at a variety of locations inside the bounding analysis areas, you may just need to make or provide larger boundaries. If your points occupy very few unique locations (if there are many coincident points), a good solution is to provide aggregation areas that overlay your points. Another option is to specify an analysis field. However, this changes the question from where are there many or few points to where do high and low analysis field values cluster spatially. |
All of the values for your analysis field are likely the same. Hot and cold spots cannot be computed when there is no variation in the field being analyzed. | Most likely you specified an analysis field that has the same value for all of your points or area features in the analysis layer. The statistic used by this tool cannot solve unless there is a variety of values to work with. | You can specify a different analysis field or, for point features, analyze point densities rather than point values. |
We were not able to compute hot and cold spots for the data provided. If appropriate, try specifying an Analysis Field. | While quite unlikely, when the tool created a fishnet grid and counted the number of points within each square, the counts for all squares were identical. | The solution would be to provide your own aggregation areas, draw or provide bounding analysis areas, or specify an analysis field. |
Cell Size should be less than Distance Band. | You have provided a Distance Band value that is smaller than the size of each grid cell. | Check the units specified for both Distance Band and Cell Size, use the default value calculated by the tool, or use a value that is larger than the size of a single grid cell. |
Additional information about the algorithms employed by the Find Hot Spots tool can be found in How Optimized Hot Spot Analysis works.
Similar tools
Use Find Hot Spots to determine if there is any statistically significant clustering in the spatial pattern of your data. Other tools that may be useful are the following:
Map Viewer analysis tools
If you are interested in finding outliers in the spatial pattern of your data, use the Find Outliers tool.
If you are interested in creating a density map of your point or line features, use the Calculate Density tool.
ArcGIS Desktop analysis tools
Find Hot Spots executes the same statistic used in the Hot Spot Analysis (Getis-Ord Gi*) and Optimized Hot Spot Analysis tools.
Find Hot Spots is also available in ArcGIS Pro. To run the tool from ArcGIS Pro, your project's active portal must be running Portal for ArcGIS 10.5 or later. You must also sign in to the portal using an account that has privileges to perform standard feature analysis in the portal.