Skip To Content

Describe Dataset

Describe DatasetThe Describe Dataset tool provides an overview of your big data. By default, the tool outputs a table layer containing summaries of your field values and an overview of your geometry and time settings for the input layer. Optionally, the tool can output a feature layer representing a sample of your input features, or a single polygon feature layer that represents the extent of your input features. You can choose to output one, both, or none.

Workflow diagram

Describe Dataset workflow diagram

Analysis using GeoAnalytics Tools

Analysis using GeoAnalytics Tools is run using distributed processing across multiple ArcGIS GeoAnalytics Server machines and cores. GeoAnalytics Tools and standard feature analysis tools in ArcGIS Enterprise have different parameters and capabilities. To learn more about these differences, see Feature analysis tool differences.

Examples

  • Verify that you correctly registered time and geometry with your big data file share.
  • Understand attribute values with summarized field statistics.
  • Visualize your big data with a sample layer. Instead of drawing a million features, draw a sample.
  • Run workflows using a sample of the data before scaling for longer and larger processing.
  • Determine where a dataset is by calculating the geographical extent.

Usage notes

Browse to the tabular, point, line, or area feature layer or big data file share dataset you want to describe using the Choose dataset to describe option.

Output a subset of your data by clicking the Sample layer button and specifying the number of features in the value picker that appears. The output subset will always have the same schema, geometry, and time settings as the input features. Use the subset to understand how your big data appears when added to a map or visualized in an attribute table. Additionally, you can run analysis on the sample dataset to determine the best inputs for larger analysis on your entire dataset.

Output a boundary feature that describes the extent of your input dataset by selecting Extent layer. The output will always be a single rectangle feature representing the geographic extent of the input features. Use the extent layer to understand where your data is located, or use it as input elsewhere in your workflow. For example, use it as the area layer to clip features to using the Clip Layer GeoAnalytics tool.

If Use current map extent is checked, only the features that are within the current map extent will be analyzed. If it's not checked, all input features in the input layer will be analyzed, even if they are outside the current map extent. For example, if you chose to output a sample layer and Use current map extent is not checked, the entire dataset will be used for sample results. If you chose to output an extent layer with Use current map extent checked, the output boundary will represent the map extent.

By default, the tool will output a table containing summary statistics for each field and a JSON describing the properties of the input layer. To access the JSON string, click the Show Result button Show Result that appears when you hover over the summary statistics table layer in the table of contents.

The JSON string includes the following information:

  • datasetName—The name of the dataset being described.
  • datasetSource—The storage location of the input dataset. This value can be ArcGIS Data Store — Relational, ArcGIS Data Store — Spatiotemporal, or Big Data File Share - <your_bdfs_name>.
  • recordCount—The total number of records in the input dataset.
  • geometry—The geometry settings of the input layer.
    • geometryType—The type of geometry the input features represent. This value can be Point, Line, Polygon, or Table.
    • sref—The spatial reference the input features use. For example, this value could be {"wkid": 26972}, in which 26972 is the spatial reference ID.
    • countNonEmpty—The number of features with a valid geometry.
    • countEmpty—The number of features without a valid geometry.
    • spatialExtent—The geographical extent of the features represented by the minimum and maximum coordinate values.
  • time—The time settings of the input layer.
    • timeType—The type of time the input features represent. This value can be Instant, Interval, or None.
    • countNonEmpty—The number of features with a valid time.
    • countEmpty—The number of features without a valid time.
    • temporalExtent—The temporal extent of the features represented by the minimum and maximum time values.

Learn more about time settings and big data file share datasets

Learn more about geometry settings and big data file share datasets

Limitations

The sample layer does not represent a truly random geographic selection and should not be used to understand the geographic extent or distribution of your data. For example, if you specify 230 features for Number of features to include, the result can contain 230 input features in any order or location.

How Describe Dataset works

Calculations

Summary statistics are calculated for each field in the input layer. Fields will have different statistics output depending on the field type. The following soil depth example outlines how statistics are calculated for each field type:

Example features that will be summarized with calculated statistics
These example input features will be summarized and output as the calculated statistics below.

Numeric statisticCalculated result

Count

Count of:

[130, 8, 250, 0, null] = 4

Sum

130 + 8 + 250 + 0 + null = 388

Minimum

Minimum of:

[130, 8, 250, 0, null] = 0

Maximum

Maximum of:

[130, 8, 250, 0, null] = 250

Mean

388/4 = 97

Range

250-0 = 250

Variance

= 13942.66667

Standard Deviation

= 118.0791

Date statisticCalculated result

Count

Count of:

[1538738400000, 1507202400000, 1475666400000, 1412508000000, null] = 4

Minimum

Minimum of:

[1538738400000, 1507202400000, 1475666400000, 1412508000000, null] = 1412508000000

Maximum

Maximum of:

[1538738400000, 1507202400000, 1475666400000, 1412508000000, null] = 1538738400000

Range

1538738400000-1412508000000 = 126230400000
Note:

Results stored in the ArcGIS Data Store are always stored in milliseconds from epoch Coordinated Universal Time (UTC). For example, the UTC time of 1538713350000 milliseconds is the equivalent to Friday, October 5, 2018 04:22:30 PM in the GMT time zone.

String statisticCalculated result

Count

["high", "high", "high", "low", null] = 4

Any

= "low"

Note:

The count statistic (for strings and numeric fields) counts the number of nonempty values. The count of [0, 1, 10, 5, null, 6] = 5. The count of [Primary, Primary, Secondary, null] = 3.

ArcGIS API for Python example

The Describe Dataset tool is available through ArcGIS API for Python.

This example describes a hurricane tracking dataset in a big data file share and outputs a subset of 200 hurricane features and an extent layer.

# Import the required ArcGIS API for Python modules
import arcgis
from arcgis import geoanalytics as ga
from arcgis.gis import GIS

# Connect to your ArcGIS Enterprise portal and confirm that GeoAnalytics is supported
portal = GIS("https://myportal.domain.com/portal", "gis_publisher", "my_password", verify_cert=False)
if not portal.geoanalytics.is_supported():
    print("Quitting, GeoAnalytics is not supported")
    exit(1)   

# Find the big data file share dataset you'll use for analysis
search_result = portal.content.search("", "Big Data File Share")

# Look through the search results for a big data file share with the matching name
bdfs_search = next(x for x in search_result if x.title == "bigDataFileShares_NaturalDisasters")

# Look through the big data file share for Hurricanes
hurricanes = next(x for x in bdfs_search.layers if x.properties.name == "Hurricanes")

# Run the Describe Dataset tool
result = ga.summarize_data.describe_dataset(input_layer=hurricanes, sample_size=200, 
																																												extent_output=true, output_name="Hurricanes_describe")

# Visualize the sample and extent layers if you are running Python in a Jupyter Notebook
processed_map = portal.map()
processed_map.add_layer(result)
processed_map

Similar tools

Use Describe Dataset when you want to explore your data using samples, statistics, and summarization. Other tools may be useful in solving similar but slightly different problems.

Map Viewer analysis tools

Aggregate your dataset into bins or areas and output summary statistics using the Aggregate Points ArcGIS GeoAnalytics Server tool.

Create a subset of your data within a certain area using the Clip Layer ArcGIS GeoAnalytics Server tool.

ArcGIS Desktop analysis tools

To run this tool from ArcGIS Pro, your active portal must be Enterprise 10.7 or later. You must sign in using an account that has privileges to perform GeoAnalytics Feature Analysis.