Note:
This functionality is currently only supported in Map Viewer Classic (formerly known as Map Viewer). It will be available in a future release of the new Map Viewer.
The Summarize Within tool calculates statistics in areas where an input layer is within or overlaps a boundary layer. The area you are summarizing within can be an area layer or a hexagonal or square bin.
Workflow diagram
Analysis using GeoAnalytics Tools
Analysis using GeoAnalytics Tools is run using distributed processing across multiple ArcGIS GeoAnalytics Server machines and cores. GeoAnalytics Tools and standard feature analysis tools in ArcGIS Enterprise have different parameters and capabilities. To learn more about these differences, see Feature analysis tool differences.
Examples
To complete routine maintenance projects efficiently, the city uses Summarize Within to count the street lights and to sum the miles of bike lanes within each maintenance assessment district. It can then estimate the material and staff needed to complete the work in each district.
A cable provider is starting a pilot program that provides low-cost Internet access to low-income community college students. Summarize Within can be used to determine the number of low-income families in each college district so the cable provider can choose an appropriate district for its pilot program.
A development company wants to create a new mixed-use project development in an urban center for the county. Within each city, the square area of potential development sites with good access to shops, restaurants, and light rail can be calculated using Summarize Within. This simplifies the site selection process.
Usage notes
The inputs for Summarize Within must be two layers. The first layer is the area used as a boundary to summarize your second layer; it's called the Summary Area and can be composed of an area layer you specify or square or hexagonal bins. The second layer specified is a point, line, or area layer to be summarized and is called the Summarized Layer.
Learn more about supported data types for GeoAnalytics Tools
Tip:
Depending on the configuration of your organization, you will have access to either Esri ArcGIS Living Atlas Analysis Layers, such as counties and hex bins, or Custom Analysis Layers. Click the drop-down arrow for the Choose area layer to summarize other features within its boundaries parameter to select an analysis layer to use as a boundary.
A Count of Points, Total Length, or Total Area box will appear depending on the type of features to summarize in your layer. The boxes are checked by default and can only be unchecked if statistics are being calculated. The default distance measure will depend on the units in your profile.
Total | Input features | Default | Options |
---|---|---|---|
Count of Points | Points | None | None |
Total Length | Lines | Miles (U.S. Standard setting) or Kilometers (Metric setting) |
|
Total Area | Areas | Square Miles (U.S. Standard setting) or Square Kilometers (Metric setting) |
|
You can optionally calculate standard statistics. For lines and areas, all weighted statistics will be calculated. Both the standard summary field statistics and the weighted summary field statistics are applied to data for the features in the Summarized Layer that intersect the Summary Area layer. The weighted summary field statistics are multiplied by a weight based on the proportion of the Summary Area intersecting each feature in the Summarized Layer.
For standard statistics, there are eight options: count, sum, mean, minimum, maximum, range, standard deviation, and variance. There are two options for string statistics: count and any. There are eight weighted statistics that are calculated on numeric fields in the layer to be summarized: count, sum, mean, minimum, maximum, range, standard deviation, and variance. Weighted statistics are not calculated for string data.
Each time a Field value and Statistic value are specified, a row is added to the tool pane so more than one statistic can be calculated. You can view the summarized results in the result layer's table or pop-ups. By default, the count of features intersecting the Summary Area is always calculated.
Optionally, a Group By field can be selected so statistics are calculated separately for each unique attribute value. When a Group By field is selected, a summary table listing each feature and statistic by Group By field value will also be created.
The Add minority, majority and Add percentages check boxes are checked when a Group By field is selected. The minority and majority will be the least and most dominant value from the Group By field, respectively, where dominance is determined using the count of points, total length, or total area of each value.
When the Add minority, majority check box is checked, two fields will be added to the result layer. The fields will list the values from the Group By field that are the minority and majority for each result feature.
When the Add percentages check box is checked, two fields will be added to the result layer listing the percentage of the count of points, total length, or total area that belong to the minority and majority values for each feature. A percentage field will also be added to the result table listing the percentage of the count of points, total length, or total area that belong to all values from the Group By field for each feature.
It is important to consider the statistic you are calculating and what the data represents when choosing between standard and weighted statistics. For example, you may want to use weighted statistics for counts and amounts, and standard statistics for rates and indexes.
If Use current map extent is checked, only those features in the input layer and the layer to be summarized that are visible within the current map extent will be analyzed. If unchecked, all features in both the input layer and the layer to be summarized will be analyzed, even if they are outside the current map extent.
GeoAnalytics Tools analysis using binning (hexagon or square) with a specified geographic coordinate system will automatically use a projected coordinate system based on the extent of the data. To learn more about setting your coordinate system for analysis, see Use the analysis environments for GeoAnalytics Tools in Map Viewer.
Statistical calculations are computed using geodesic distances.
Limitation
Only summary areas that intersect at least one feature in the layer that is summarized will be included in the results.
How Summarize Within works
Equations
For summarized line and area features, weighted statistics incorporate Summary Area weights. None of the statistics for point features are weighted. The following table shows the equations used to calculate variance, the weighted mean, and the weighted standard deviation.
Statistic | Equation | Variables | Features |
---|---|---|---|
Variance | Points | ||
Weighted Mean | Weights are calculated as the percentage of the feature within the summary area. | Lines and Areas | |
Weighted Standard Deviation | Weights are calculated as the percentage of the feature within the summary area. | Lines and Areas |
Points
Point layers are summarized using only the point features that fall within the Summary Area. Weighted statistics cannot be applied when summarizing points.
The figure and table below explain the statistical calculations of a point Summarized Layer within hypothetical areas. The Population field was used to calculate the statistics (Count, Sum, Minimum, Maximum, Range, Mean, Standard Deviation, and Variance) for the layer.
Numeric statistic | Results District A |
---|---|
Count | Count of:
|
Sum |
|
Minimum | Minimum of:
|
Maximum | Maximum of:
|
Range |
|
Mean |
|
Variance |
|
Standard Deviation |
|
String statistic | Results District A |
---|---|
Count |
|
Any | = Secondary School |
Note:
The count statistic (for strings and numeric fields) counts the number of nonnull values. The counts of [0, 1, 10, 5, null, 6] = 5. The count of [Primary, Primary, Secondary, null] = 3.
A real-life scenario in which this analysis could be used is in determining the total number of students in each school district. Each point represents a school. The Type field gives the type of school (elementary, middle school, or secondary) and a student population field gives the number of students enrolled at each school. The calculations and results are given for District A in the table above. From the results, you can see that District A has 2,568 students. When running the Summarize Within tool, the results would also be given for District B.
Lines
For weighted statistics, line layers are summarized using only the proportions of line features that are within the Summary Area. Standard (non-weighted) statistics summarize any line intersecting the Summary Area. When summarizing lines using weighted statistics, use counts and amounts (rather than rates or indices) so proportional calculations make logical sense in your analysis.
The figure and table below explain the statistical calculations of a line Summarized Layer within a hypothetical Summary Area. The Volume field was used to calculate the statistics (Count, Sum, Minimum, Maximum, Range, Mean, Standard Deviation, and Variance) for the layer. The standard statistics are calculated using lines that intersect the boundary and the weighted statistics are calculated using the proportion of the lines that are within the Summary Area.
Numeric statistics | Standard statistics | Weighted statistics |
---|---|---|
Calculating Weights | Not applicable | Weight of the brown line (value = 600):
Weight of the blue line (value = 1000):
|
Count | Count of:
| Count of:
|
Sum |
|
|
Minimum | Minimum of:
| Minimum of:
|
Maximum | Maximum of:
| Maximum of:
|
Range |
|
|
Mean |
|
|
Variance |
|
|
Standard Deviation |
|
|
A real-life scenario in which this analysis could be used is in determining the total volume of water in rivers within the boundaries of a state park. Each line represents a river that is partially located inside the park. From the results, you can see that there are 5 miles of rivers within the park and the total volume is 900 units.
Areas
Area layers are summarized using only the proportions of the area features that are within the input boundary. When summarizing areas, use fields with absolute numbers so proportional calculations make logical sense in your analysis. The results layer will be displayed using graduated colors.
Weighted statistics for summarized area layers are based on the proportions of the Summary Area features that are within the Summarized Layer. When summarizing areas, use counts or amounts (rather than rates or indices) so proportional calculations make logical sense in your analysis.
The figure and table below explain the statistical calculations of an area layer within a hypothetical Summary Area. The population field was used to calculate the statistics (Count, Sum, Minimum, Maximum, Range, Mean, Standard Deviation, and Variance) for the layer. The standard statistics are calculated using areas that intersect the Summary Area, and the weighted statistics are calculated using a proportional weight based on the portion of summary areas contained within each Summarized Layer.
Numeric statistics | Standard statistics: Results Neighborhood 1 | Weighted statistics: Results Neighborhood 1 |
---|---|---|
Calculating Weights | Weight of the yellow area (value = 3200):
Weight of the green area (value = 4700):
Weight of the pink area (value = 1000):
Weight of the blue area (value = 4500):
Weight of the orange area (value = 3600):
| |
Count | Count of:
| Count of:
|
Sum |
|
|
Minimum | Minimum of:
| Minimum of:
|
Maximum | Maximum of:
| Maximum of:
|
Range |
|
|
Mean |
|
|
Variance |
|
|
Standard Deviation |
|
|
ArcGIS API for Python example
The Summarize Within tool is available through ArcGIS API for Python.
This example calculates the distance and average slope of bike lanes within each city district.
# Import the required ArcGIS API for Python modules
import arcgis
from arcgis.gis import GIS
from arcgis.geoanalytics import summarize_data
# Connect to your ArcGIS Enterprise portal and confirm that GeoAnalytics is supported
portal = GIS("https://myportal.domain.com/portal", "gis_publisher", "my_password", verify_cert=False)
if not portal.geoanalytics.is_supported():
print("Quitting, GeoAnalytics is not supported")
exit(1)
# Find the big data file share dataset you'll use for analysis
search_result = portal.content.search("", "Big Data File Share")
# Look through the search results for a big data file share with the matching name
bdfs_search = next(x for x in search_result if x.title == "bigDataFileShares_CityData")
# Look through the big data file share for BikeLanes
bike_lanes = next(x for x in bdfs_search.layers if x.properties.name == "BikeLanes")
# Look through the big data file share for districts
districts = next(x for x in bdfs_search.layers if x.properties.name == "districts")
weighted_summary_fields = [{"statisticType" : "Average","onStatisticField" : "Slope"}]
# Run the Summarize Within tool
summarize_within_result = summarize_data.summarize_within(summary_polygons = districts,
summarized_layer = bike_lanes,
weighted_summary_fields = weighted_summary_fields,
output_name = "summary_of_bike_lanes")
# Visualize the tool results if you are running Python in a Jupyter Notebook
processed_map = portal.map('Your City, State', 10)
processed_map.add_layer(summarize_within_result)
processed_map
Similar tools
Use Summarize Within to calculate statistics for features that overlap a boundary layer. Other tools may be useful in solving similar but slightly different problems.
Map Viewer Classic analysis tools
If you are trying to summarize points and want to apply time stepping, use the Aggregate Points tool.
ArcGIS Desktop analysis tools
This GeoAnalytics Tools is available in ArcGIS Pro.
Summarize Within performs the functions of the Spatial Join and Summary Statistics tools.