The Find Point Clusters tool identifies clusters of point features from surrounding noise based on their spatial distribution.
Example uses of this tool include the following:
- An organization that studies a particular pest-borne disease wants to identify where in their study area to begin treatment and extermination of these pests. An analyst has a point dataset that represents the infested and noninfested households in the study area. The analyst uses the Find Point Clusters tool to find the largest cluster of infested households.
- A disaster response organization needs to determine where to deploy their resources for rescue and evacuation following a natural disaster. An analyst uses the Find Point Clusters tool to identify clusters of geolocated tweets that mention the event. The organization uses the size and location of the clusters to map the impacted area and inform their relief efforts.
The Find Point Clusters tool includes configurations for input features, cluster settings, and the result layer.
The Input features group includes the Input layer parameter, which is the layer with point features that will be grouped into clusters based on their spatial distribution.
Web Mercator is not an appropriate projection for spatial analysis. If the spatial reference system of the input layer is WGS 1984 Web Mercator (Auxiliary Sphere) the data will be converted to a geographic coordinate system in order to use chordal distances in the analysis.
The Cluster settings group includes the following parameters:
- Clustering method specifies the method that will be used to identify clusters.
- Defined distance (DBSCAN)—Identifies clusters by searching within a specified search distance. This method is appropriate when all the meaningful clusters have similar densities.
- Self-adjusting (HDBSCAN)—Uses a range of distances to separate clusters of varying densities from sparser noise. This method is the most data-driven of the clustering methods so it does not need a search distance.
- Multi-scale (OPTICS)—Identifies clusters using the distance between neighbors and a reachability plot. The method first determines the minimum reachability distance for all the points. The minimum reachability distance is the distance from a point to its nearest neighbor that has not yet been visited by the search. Once the minimum reachability distance for all the points is determined, the tool constructs a reachability plot . The reachability plot plots each point's reachability order and its reachability distance revealing the clustering structure of the points. This method then uses the Cluster sensitivity value to identify clusters. Similar to the HDBSCAN method, the OPTICS method can identify clusters with varying densities.
- Minimum points per cluster is the minimum number of points that will be used to consider a grouping of points a cluster. In general, the smaller the value, the more clusters that will be detected. This value must be less than or equal to the number of points in the layer. The minimum value supported is 2.
- Search distance specifies the maximum distance around each point that will be considered. If the Clustering method value is Defined distance (DBSCAN), the Search distance value is the maximum distance around each point feature in the cluster to search for points that can be included in the cluster. If the minimum number of points can be found within the search distance of a particular point, that point is considered a core point. If the minimum number of points cannot be found within the search distance of a particular point but that point falls within the search distance of a core point, the point is considered a border point. Clusters will be composed of both core points and border points. If the Clustering method value is Multi-scale (OPTICS), the Search distance is the maximum distance around each point to search for points to assign a reachability distance. Reachability distance is the distance from a point to its nearest neighbor that has not yet been visited by the search. Points within the core distance of a point are assigned the core distance as their reachability distance. The core distance of a point is a measurement of the distance that is required to travel from each point to the defined minimum number of features.
- Search distance unit is the units for the Search distance value.
- Cluster sensitivity is how the shape (both slope and height) of peaks within the reachability plot will be used to separate clusters. The reachability plot plots the reachability order of the points and their reachability distance. A very high Cluster sensitivity value (close to 100) will treat even the smallest peaks in the reachability plot as a separation between clusters. A very low Cluster sensitivity value (close to 0) will treat only the steepest, highest peaks in the reachability plot as a separation between clusters. If left blank, the tool will find a sensitivity value using the Kullback-Leibler divergence.
The Result layer group includes the following parameters:
- Output name determines the name of the layer that is created and added to the map. The name must be unique. If a layer with the same name already exists in your organization, the tool will fail and you will be prompted to use a different name.
- Save in folder specifies the name of a folder in My Content where the result will be saved.
Analysis environment settings are additional parameters that affect a tool's results. You can access the tool's analysis environment settings from the Environment settings parameter group.
This tool honors the following analysis environments:
The tool outputs a point layer. If the Clustering method parameter value is Self-adjusting (HDBSCAN) or Multi-scale (OPTICS), the tool will also output a chart. The output layer of all the Clustering method options will include Cluster ID, Source ID, and Color ID fields. The Cluster ID field identifies the cluster each point belongs to. Noise points will have a value of -1. The Source ID field value is a unique identifier. The Color ID field value represents the color assigned to a point and its cluster. If the output layer includes more than nine clusters, multiple clusters will be assigned to each color. However, neighboring clusters will be assigned different colors to keep them visually distinct. If the Clustering method parameter value is Self-adjusting (HDBSCAN), the output point layer will contain the following additional fields:
- Probability is a value between 0 and 1 that denotes the probability that a point belongs to its assigned cluster. Noise points will have a value of 0.
- Outlier is a value between 0 and 1 that denotes whether a point may be an outlier within its own cluster. The noise points will be considered a single cluster. A high value indicates that the point is more likely to be an outlier.
- Exemplar is a value between 0 and 1 that denotes whether a point is most representative of its cluster.
- Stability is a value that reflects the persistence of each cluster across a range of scales. A larger value indicates that the cluster persists over a wide range of distance scales.
If the Clustering method parameter value is Multi-scale (OPTICS), the output layer will include the following additional fields:
- Reachability order is how the input features were ordered for the analysis
- Reachability distance is the distance between each point and its closest unvisited neighbor.
If the Clustering method parameter value is Self-adjusting (HDBSCAN) or Multi-scale (OPTICS), the tool will output a chart. Multi-scale (OPTICS) outputs a reachability plot which can be used to evaluate the density of each cluster. Self-adjusting (HDBSCAN) outputs a distribution of membership probability chart, which displays the distribution of probability that a feature belongs to its assigned cluster. To view the chart, click Charts on the Contents toolbar.
You can view additional details about the analysis on the output layer's item page. To access the layer's item page, click Analysis on the Settings toolbar. Click History, and find and click the successful tool run. The analysis details will open on the Results tab. Click the options button next to the output layer, and click Open item details.
This tool requires the following licensing and configurations:
- Creator or GIS Professional user type
- Publisher or Administrator role, or an equivalent custom role
Use the following resources to learn more: