Skip To Content

Manage big data file shares in a portal

Big data file shares are registered through your portal's content page. When you add a big data file share in your portal, it also creates a related data store item. When you add a cloud store big data file share, it creates a big data file share item, a data store item of type big data file share, and a data store item of type cloud store. A big data file share portal item includes the following tabs:

  • Overview—Provides general information on your big data file share and the related data store items. The related data store items can be shared and deleted with your big data file share.
  • Datasets—Lists the datasets and outlines the schema of the input data. Dataset information includes the fields and formats that represent geometry and time.
  • Outputs—Outlines optional output templates, which allow you to write results to a big data file share. The output templates are optional and are created after you register a big data file share. See Create, edit, and view output templates to learn how to create or edit an output template.
  • Settings—Describes content status, extent, and delete protection.

You can view and edit the datasets and schema and the output templates through the big data file share item.

Note:
To share a Big Data File Share item, you must share the root data store item. The root data store for a big data file share of type Cloud is the Data Store (Cloud) item of the same name. For all other types of big data file shares (File Share, HDFS, and HIVE), the root data store is the Data Store (Big Data File Share) item of the same name.

Edit big data file shares

Once you have created a big data file share through your portal, you can use the big data file share item to view the datasets, edit the datasets' formatting, or sync your big data file share to add additional datasets.

A big data file share is composed of one or more datasets. The number of datasets is dependent on the number of folders in your big data file share location. You can view the datasets that have been successfully registered in your big data file share.

If you expected to find more datasets in your big data file share or are missing any, do the following:

  • Verify that you correctly registered the top-level folder. For more information, see Prepare your data.
  • Confirm that your input data is in an allowable format, such as a collection of delimited files, shapefiles, Parquet, or Optimized Row Columnar (ORC).
  • Ensure that the schema of your input dataset of interest is consistent for a collection of files (all files in a single dataset must have the same fields).

You can use the dataset to verify the number of datasets in a big data file share or review dataset schemas for a registered dataset. You can modify a selected dataset's schema by updating its geometry, time definition, and field names using the steps below.

Edit big data file share input datasets

Editing the big data file share item allows you to modify how your data is registered and used for analysis. You can also use the edit option to view how your data is currently registered. For details about each option on this dialog box, see editing parameters in big data file shares. To edit dataset parameters, do the following:

  1. Open the Big Data File Share item in your portal contents.
  2. Click the Dataset tab.
  3. Click the Edit button beside the dataset you want to edit.
  4. Modify the dataset using the Fields, Geometry, Time, and File options.
  5. When you have finished editing dataset properties, click Save.

Delete big data file share input datasets

Deleting a dataset allows you to customize which datasets are available in the big data file share. Deleting a dataset does not delete the data in the source location. If you later decide you want the deleted dataset to be available in the big data connection, you can use the sync option. To delete datasets from a big data file share, do the following:

  1. Open the Big Data File Share item in your portal contents.
  2. Click the Dataset tab.
  3. Check the check box beside the dataset you want to delete.
  4. Click the Delete button at the top of the dataset table to remove the dataset from the big data file share.

Edit a big data file share manifest or hints file

On the Show advanced option of the Datasets tab of the big data file share, you can view, download, and upload the manifest or hints file. If you upload a manifest, it overwrites any changes you have made to your big data file share datasets and replaces the existing datasets and schema. To learn more about the big data file share manifest, see Big data file share manifest. To learn more about using a hints file, see Hints file. To edit a big data file share manifest or hints file, do the following:

  1. Open the Big Data File Share item in your portal contents.
  2. Click the Datasets tab.
  3. Click the Show advanced toggle button to turn it on.
    1. To download the manifest file, click Download in the manifest section.
    2. To download the hints file, click Download in the hints section.
  4. Use a text editor to modify and save changes locally to the downloaded .json manifest file or .dat hints file.
    Tip:
    The default file format for the hints file is .dat. Once you've downloaded the file, you can change its extension to .txt and edit the file.
  5. To upload an edited file, in the big data file share, go to the Dataset tab, and turn on Show advanced.
    1. To upload the manifest, click Upload in the manifest section, and browse to the updated .json file.
    2. To upload the hints file, click Upload in the hints section, and browse to the updated .txt file.
  6. Click Upload.

If you upload a hints file, sync the big data file share. When you sync, only datasets with hints or new datasets are updated, and changes made to any other datasets that are not in the hints file remain the same.

Sync your big data file share

You can sync in your big data connection if you add new datasets to your data source or if you upload a hints file. The hints file provides specifications that are used when regenerating the big data file share.

Note:
When a big data file share is synced, it only updates the big data file share for existing datasets that have a hints file or new datasets. Any edits you make to the datasets that are in the hints file are overwritten with the rules defined in the hints file.

  1. Open the Big Data File Share item in your portal contents.
  2. Click the Datasets tab.
  3. Click the Sync button to turn it on.

Create, edit, and view output templates

You can create, view, or edit output templates. You can also edit attributes and settings for the output templates, which outline how output results are written to the big data file share.

To create an output template, complete the following steps:

  1. Open the Big Data File Share item in your portal contents.
  2. Click the Outputs tab.
  3. Click the Add output template button.
  4. Create a name for the output template and select the file type the output template will write to.
    1. Set the geometry formats for this template by clicking the Geometry tab. You can set them for one, two, or all geometry types. The formatting options are the same as input big data file shares.
    2. Set the time formats for this template by clicking Time tab. You can leave the time blank, set for one of instant or interval, or both. The time formatting options are the same as input big data file share time formats.
  5. Click Save when you're done.

Use the same steps to view or edit a template.

Big data file share editing parameters

The big data file share editor comprises the following four sections:

  • Fields
  • Geometry
  • Time
  • File

It is recommended that you use a hints file before editing your data if manifest generation did not correctly determine field names, encoding, field delimiters, or quote characters of a delimited file.

Fields

The fields section lists all of the fields in a dataset. When you select a dataset, you can see the following for each field:

  • The name of the field
  • The field type

You can only modify the field name and type for delimited files. If you are modifying many field names, it is recommended that you use a hints file.

Learn more about supported field types

Geometry

The geometry section lists the type of geometry, how it is represented, and the spatial reference. The following table outlines the available options, with notes for changes you can make, depending on the input dataset type:

Geometry parameters

ParameterDescriptionDelimited filesShapefilesORC filesParquet files

Geometry

The geometry type. Options are Point, Polyline, Polygon, or None. If there is no geometry (None), the dataset is a table.

Editable

Cannot be modified

Editable

Editable

Spatial reference (WKID/WKT)

The spatial reference of the dataset. This option is only shown if geometry is not none.

Editable. By default, it will be set to 4326, WGS 1984.

Cannot be modified

Editable

Editable

Geometry format type

How the geometry is formatted for each feature. Options are XYZ (fields that represent x-, y-, and optionally z-values—XYZ is only applicable to points), WKT (well-known text), WKB (well-known binary),GeoJson, EsriJson, and EsriShape . This option is only shown if the geometry is not none.

Editable

Not available; option not shown

Editable

Editable

Geometry fields

This is used to specify which fields represent geometries.

In some cases, the field must be a specific field type. WKB and EsriShape formats require a binary field, and GeoJSON and EsriJSON require a string field. XYZ fields must be numeric. This option is only shown if the geometry is not none.

Editable

Not available; option not shown

Editable

Editable

Time

The time section outlines how time is represented. The following table outlines the available options, with notes for changes you can make depending on the input dataset type. Time options are the same for all data types except where noted.

Time parameters

ParameterDescriptionExample

Time type

The type of the input time. Options are Instant (a single moment in time), Interval (a span of time with a start and end time), and None.

Instant

Time fields, Start time fields, and End time fields

If you select Instant, you see Time fields. If you select Interval, you see Start time fields and End time fields.

These options specify the fields and formatting used to define time in your input data.

Time can use one or more fields to define time and can use one or more formats for a single field. By default, the first field with the name time is used as the time field, with an estimate of the time format. If there is a shapefile, the first field of type date is used.

At least one row must be populated for these tables. See Time formats to learn more about formatting.

The time formatting table is only available if Time Type is not None.

Example with a single field used to represent time with two different formats

  • Field—TimeField Format—yy/MM/dd hh:mm:ss
  • Field—TimeField Format—yyyy-MMM-dd hh:mm:ss

Example with two fields used to represent time

  • Field—DateField Format—yy/MM/dd
  • Field—TimeField Format—hh:mm:ss

Time zone

The time zone of the input time. This option is only available if Time Type is not None. The default is UTC.

UTC

Time formats

The following table outlines how to represent time formatting. All examples show how to represent the time 9:45:02.05 PM on January 2, 2016.

Time formats in big data file shares

FormatMeaningExample

yy

The year, represented by two digits.

16

yyyy

The year, represented by four digits.

2016

MM

The month, represented numerically.

01 or 1

MMM

The month, represented using three letters.

Jan

MMMM

The month, represented using the complete spelling.

January

dd

The date.

02 or 2

HH

The hour when using a 24-hour day; values range from 0 to 23.

21

hh

The hour when using a 12-hour day; values range from 1 to 12.

9

mm

The minute; values range from 0 to 59.

45

ss

The second; values range from 0 to 59.

02

SSS

The millisecond; values range from 0 to 999.

50

a

The AM/PM marker.

PM

epoch_millis

The time in milliseconds from epoch.

1509581781000

epoch_seconds

The time in seconds from epoch.

1509747601

Z

The time zone offset expressed in hours.

-0100 or -01:00

ZZZ

The time zone offset expressed using IDs.

America/Los_Angeles

''

Use single quotes to add text that doesn't represent a value outlined in this table.

'T'

The following table shows examples of different formats for the same date, January 2, 2016, at 9:45:02.05 PM:

Time format examples

Input dateFormat

01/02/2016 9:45:02PM

MM/dd/yyyy hh:mm:ssa

Jan02-16 21:45:02

MMMdd-yy HH:mm:ss

January 02 2016 9:45:02.050PM

MMMM dd yyyy hh:mm:ss.SSSa

01/02/2017T9:45:14:05-0000

MM/dd/yyyy'T'HH:mm:ssZ

File

The file section outlines the format the data is in. Data may be in one of the following formats:

  • Shapefile (.shp)
  • Delimited file (for example, .csv)
  • Parquet file
  • ORC file

The available parameters differ depending on the dataset. For shapefiles, ORC, and Parquet files, the only parameter is the file type, which cannot be modified. If the input dataset is a delimited file, multiple parameters can be modified. To modify values for a delimited file, use a hints file and regenerate the manifest. These parameters are outlined in the following table:

Dataset formats

ParameterDescription

File extension

Lists the file type extension on the input dataset. Common formats are .csv and .txt.

Field delimiter

Determines the delimiter for each field. Common formats are , and ;.

Record terminator

Determines the terminator for each row of data. Common formats are \n and \t.

Quote character

Determines the character used for quotes.

Has header row

A Boolean value that determines if the input table included a header row. If a header row is included, the headers will be used for the field names. Field name information is predicting geometry and time fields.

Encoding

The type of encoding used on the file. By default, this is UTF-8.

Big data file share output template editing parameters

The big data file share output template editor comprises the following three sections:

  • Name and file type
  • Geometry formatting
  • Time formatting
Note:

The input big data file shares have a fields section. The output templates do not have a fields section, since the resulting fields are determined by the GeoAnalytics Tools creating the result. ORC only supports field names that include the basic Latin alphabet and numeric characters. All other characters in a field name are replaced with an underscore.

Output geometry formats

The geometry section lists how you want the output geometry to be formatted for each geometry type (point, line, polygon). There are two parts to determining the output geometry:

  • The spatial reference—You can leave it empty, and it uses the tool results (default). Optionally, provide a WKID or WKT string, and all results are projected to that spatial reference. This value is shared across all output geometries.
  • The geometry formatting type and fields—This is described in more detail below.
For each template, you can define how you want to format the geometry of the dataset as well as the field names that represent geometry. Depending on the dataset type (delimited files, shapefiles, ORC, or Parquet), you can output results in various formats. Shapefiles do not have a specified format and always write a shapefile dataset. The following table outlines these formats:

Output geometry formats

Geometry typeOutput fieldsDelimited filesShapefilesORC filesParquet files

XYZ—An X, Y, and optionally Z field. This option is only available for points.

By default, three new fields are created, named X, Y, and Z. You can optionally change these field names.

YesYesYes

WKT

By default, one new field named Geometry is created. You can optionally change the output field names.

YesYesYes

GeoJSON

By default, one new field named Geometry is created. You can optionally change the output field names.

YesYesYes

EsriJSON

By default, one new field named Geometry is created. You can optionally change the output field names.

YesYesYes

WKB

By default, one new field named Geometry is created. You can optionally change the output field names.

YesYes

EsriShape

By default, one new field named Geometry is created. You can optionally change the output field names.

YesYes

Output time formats

The time section outlines how output time is represented. Formatting time requires the following information:

  • Formatting for both instants and intervals.
  • The field names to which time is written.
  • The format (String or Date) in which time is written. Note that delimited files can only be formatted with string.
  • For intervals, which fields represent the start and end time.

Time formatting is the same as for input big data files. See Time formats in big data file shares.

Output dataset format

The dataset format section outlines the output format to which the data is written. Data may be in one of the following formats:

  • Shapefile (.shp)
  • Delimited file (for example, .csv)
  • Parquet file
  • ORC file

The available parameters differ depending on the dataset. For shapefiles, ORC, and Parquet files, the only parameter is the file type, which cannot be modified. If the dataset is a delimited file, multiple parameters can be modified in ArcGIS Server Manager. These are outlined in the following table:

Dataset formats

ParameterDescription

File extension

Extensions are never applied to an output dataset.

Field delimiter

Determines the delimiter for each field. Common formats are , and ;.

Record terminator

The terminator for each row of data cannot be set. For Windows, the terminator is \r\n. For Linux, it's \n.

Quote character

Determines the character used for quotes.

Has header row

A Boolean value that determines if the output table includes a header row representing the field names. By default, this is true.

Encoding

This is always UTF-8.