Skip To Content

# Summary statistics

Summary statistics are calculated by the Aggregate Points, Summarize Within, Summarize Nearby, Join Features, and Dissolve Boundaries tools.

## Equations

Mean and standard deviation are calculated using weighted mean and weighted standard deviation for line and polygon features. None of the statistics for point features are weighted. The weight is the length or area of the feature that falls within the boundary.

The following table shows the equations used to calculate standard deviation, weighted mean, and weighted standard deviation:

StatisticEquationVariablesFeatures

Standard Deviation

where:

• N = Number of observations
• xi = Observations
• = Mean

Points

Weighted Mean

where:

• N = Number of observations
• xi = Observations
• wi = Weights

Lines and polygons

Weighted Standard Deviation

where:

• N = Number of observations
• xi = Observations
• wi = Weights
• w = Weighted mean
• N' = Number of non-zero weights

Lines and polygons

##### Note:

Null values are excluded from all statistical calculations. For example, the mean of 10, 5, and a null value is:

``(10+5)/2=7.5``

## Points

Point layers are summarized using only the point features within the boundary areas.

A real-life scenario in which points could be summarized is in determining the total number of students in each school district. Each point represents a school. The Type field gives the type of school (primary school, middle school, or secondary school) and a population field gives the number of students enrolled at each school.

The figure below shows a hypothetical point and boundary layer, and the table summarizes the attributes for the point layer.

ObjectIDDistrictTypePopulation

1

A

Primary school

280

2

A

Primary school

408

3

A

Primary school

356

4

A

Middle school

361

5

A

Middle school

450

6

A

Secondary school

713

7

B

Primary school

370

8

B

Primary school

422

9

B

Primary school

495

10

B

Middle school

607

11

B

Middle school

574

12

B

Secondary school

932

The calculations and results for District A are given in the table below. From the results, you can see that District A has 2,568 students. When running a tool, the results would also be given for District B.

StatisticResult District A

Sum

``````280+408+356+361+450+713
=2568``````

Minimum

Minimum of:

``````[280, 408, 356, 361, 450, 713]
=280``````

Maximum

Maximum of:

``````[280, 408, 356, 361, 450, 713]
=713``````

Mean

``````2568/6
=428``````

Standard Deviation

``````√((280-428)²+(408-428)²+(356-428)²+(361-428)²+(450-428)²+(713-428)²)/(6-1)
=150.79``````

## Lines

Line layers are summarized using only the proportions of the line features that are within the boundary areas.

##### Tip:

When summarizing lines, use fields with counts or amounts so proportional calculations make logical sense in your analysis. For example, use population rather than population density.

A real-life scenario in which you can use this analysis is determining the total volume of water in rivers within a specified boundary. Each line represents a river that is partially located inside the boundary.

The figure below shows a hypothetical line and boundary layer, and the table summarizes the attributes for the line layer.

RiverLength (miles)Volume (gallons)

Yellow

3

6,000

Blue

8

10,000

The calculations for volume are given in the table below. From the results, you can see that the total volume is 9,000 gallons.

##### Note:

The calculations use the proportions of the lines within the boundary area. For example, the yellow line has a total volume of 6,000 gallons with two of its three total miles within the boundary. Therefore, the calculations are preformed using 4,000 gallons as the volume for the yellow line:

``6000*(2/3)=4000``
StatisticResult

Sum

``4000+5000=9000``

Minimum

Minimum of:

``[4000, 5000]=4000``

Maximum

Maximum of:

``[4000, 5000]=5000``

Mean

``````((2*4000)+(3*5000))/(2+3)
=(8000+15000)/5
=4600``````

Standard Deviation

``````√(2(4000-4600)²+3(5000-4600)²)/((2-1)/2(2+3))
=692.8``````

## Polygons

Polygon layers are summarized using only the proportions of the polygon features that are within the boundary areas.

##### Tip:

When summarizing polygons, use fields with counts or amounts so proportional calculations make logical sense in your analysis. For example, use population rather than population density.

A real-life scenario in which you can use this analysis is determining the population in a city neighborhood. The blue outline represents the boundary of the neighborhood and the smaller polygons represent census blocks.

The figure below shows a hypothetical polygon and boundary layer, and the table summarizes the attributes for the polygon layer.

Census blockArea (miles²)Population

Yellow

6

3,200

Green

6

4,700

Pink

2.5

1,000

Blue

8

4,500

Orange

4

3,600

The calculations for population are given in the table below. From the results, you can see that there are 10,841 people in the neighborhood and an average (mean) of approximately 2,666 people per census block.

##### Note:

The calculations use the proportions of the polygons within the boundary area. For example, the yellow polygon has a total population of 3,200 with four of its six total square miles within the boundary. Therefore, the calculations are preformed using 2,133 as the population for the yellow polygon:

``3200*(4/6)=2133``
StatisticResult

Sum

``2133+3133+400+3375+1800=10841``

Minimum

Minimum of:

``[2133, 3133, 400, 3375, 1800]=400``

Maximum

Maximum of:

``[2133, 3133, 400, 3375, 1800]=3375``

Mean

``````((4*2133)+(4*3133)+((1*400)+(6*3375)+(2*1800))/(4+4+1+6+2)
=2665.53``````

Standard Deviation

``````√(4(2133-2665.53)²+4(3133-2665.53)²+1(400-2665.53)²+6(3375-2665.53)²+2(1800-2665.53)²)/((5-1)/5(4+4+1+6+2))
=925.91``````

## Related topics

Use the following topics to learn more about summary statistics within a specific tool: