Skip To Content

Edit big data file share manifests in Server Manager

Big data file shares are registered as a data store through ArcGIS Server Manager on your ArcGIS GeoAnalytics Server. A big data file share requires a manifest to outline the schema of the input data, as well as the fields and formats that represent geometry and time in a dataset. The manifest is automatically generated when you register a big data file share. You may need to make modifications if there are any changes to your data, or if the manifest generation was unable to determine all the information needed — for example, if the automatically-generated manifest did not select the correct field for the geometry or time. A big data file share may optionally have output templates, used to outline the format of results written to the big data file share. The output templates are generated when you register a big data file share, and select to use the big data file share as an output location. You may need to modify one or more templates, such as the format of the time and geometry fields, or you may want to add or delete a template.

You can view and edit the datasets and manifest information, as well as the output templates through ArcGIS Server Manager on your ArcGIS GeoAnalytics Server.

Edit a big data file share

Once you have registered a big data file share, you can view and edit attributes and settings for that item's registered datasets by opening the big data file share manifest editor. You can also edit attributes and settings for the optional output templates, which outline how output results will be written to the big data file share.

For example, for input data, you may want to verify the number of datasets within a registered file share. If, in doing so, you do not see the expected number of datasets in the registered file share, you should check whether the registered location contains valid datasets.

For an output template, you may want to format a delimited file output to write a tab-delimited file and use WKT to store the geometry.

You may also want to review dataset schemas for a registered big data file share. You can modify a selected dataset's schema by updating its geometry, time definition, and field names in its associated manifest resource.

On the advanced tab of the big data file share manifest editor, you can upload a hints file to provide information about a dataset, such as the presence or absence of a header row, encoding, field delimiter, or record terminator. Regenerating the manifest after uploading a hints file will use the information provided to generate the manifest.

Optionally, you can download the manifest, edit it, and upload the edited file.

Edit big data file share input datasets

In the big data file share manifest editor, you can view a selected big data file share and the datasets that have been successfully registered within it. When selecting a dataset from the editor drop-down menu, the corresponding parameters are populated. For details about each option on this dialog box, see editing parameters in big data file shares. To edit dataset parameters, do the following:

  1. On the Registered Data Stores dialog box, locate the big data file share you want to edit.
  2. Click the Edit pencil to see details and options for corresponding datasets.
  3. Click the Datasets tab to show the registered datasets and their corresponding parameters.
  4. Select a dataset from the drop-down menu to view the information represented in its manifest. Make updates to your dataset properties as needed.
  5. When you have finished editing dataset properties, click Save.

Edit a big data file share manifest or hints file

On the Advanced tab of the big data file share editor, you can edit the associated manifest or hints file by choosing its respective tab. If you upload a manifest, it will overwrite any changes you have made to your big data file share manifest in the editor, and replace the current manifest. To learn more about the big data file share manifest, see Understanding a big data file share manifest. To learn more about using a hints file, see Understanding the hints file. To edit a big data file share manifest or hints file, do the following:

  1. On the Registered Data Stores dialog box, locate the big data file share you want to modify.
  2. Click the Edit pencil to see options for modifying the manifest resource.
  3. Click the Advanced tab.
  4. From the Advanced tab, choose the Manifest or Hints tab, depending on which you are modifying.
    1. To download the manifest file, click Manifest > Download.
    2. To download the hints file, click Hints > Download.
  5. Use a text editor to modify and save changes locally to the downloaded.json manifest file or .dat hints file.
    Tip:
    The default file format for the hints file is .dat. Once you've downloaded the file, you can change its extension to .txt and edit the file.
  6. To upload an edited file, click the Edit pencil for the big data file share you want to modify.
    1. To edit the manifest, click Advanced > Manifest > Upload and browse to the updated .json file.
    2. To edit the hints file, click Advanced > Hints > Upload and browse to the updated .txt file.
  7. Click Upload.

If you upload a hints file, be sure to regenerate the manifest. When you regenerate a manifest, only datasets with hints or new datasets will be updated, and changes made to any other datasets not in the hints file will remain the same.

Regenerate the manifest for a big data file share

After a big data file share is created and a manifest has been generated, a regenerate manifest button appears for each entry on the Registered Data Stores dialog box.

You can regenerate a manifest if you have added new data or if you have uploaded a hints file using the edit resource. The hints file provides specifications that are used when regenerating the manifest.

Note:
When a manifest is regenerated, it will update the manifest for existing datasets that have a hints file or new datasets. Any edits you have made to the manifest will be overwritten with the rules defined in the hints file.

Big data file share editing parameters

The big data file share editor comprises the following five sections:

  • Dataset selector
  • Fields
  • Geometry
  • Time
  • Dataset format

It is recommended to use a hints file before editing your data if manifest generation did not correctly determine field names, encoding, field delimiters, or quote characters.

Dataset selector

A manifest is composed of one or more datasets. The number of datasets is dependent on the number of folders in your big data file share location. When you open the manifest manager, you can view the datasets that have been successfully registered in your big data file share. When you select a dataset from the drop-down menu, the dataset parameters will be populated with the dataset information.

If you expected to find more datasets in your manifest or are missing any, do the following:

  • Verify that you correctly registered the top-level folder. For more information, see Register your data with ArcGIS Server Manager.
  • Check that your input data is in an allowable format, such as a collection of delimited files, shapefiles, parquet, or ORC.
  • Ensure that the schema of your input dataset of interest is consistent for a collection of files (all files in a single dataset must have the same fields).

Fields

The fields section lists all of the fields in a dataset. When you select a dataset, you will be able to see the following for each field:

  • The name of the field
  • The field type

The field name and type can be modified for delimited files. If you are modifying more than one field name, it is recommended to use a hints file.

If the input dataset is a delimited file, there will be multiple parameters that can be modified in the manifest in ArcGIS Server Manager.

Geometry

The geometry section lists the type of geometry, and how it is represented. The following table outlines the available options, with notes for changes you can make, depending on the input dataset type:

Geometry parameters

ParameterDescriptionDelimited filesShapefilesORC filesParquet files

Geometry

The Geometry type. Options are Point, Polyline, Polygon, or None. If there is no geometry, the input is a table.

Editable

Cannot be modified

Editable

Editable

Spatial reference (WKID/WKT)

The spatial reference of the dataset. This option is only shown if the dataset is not a table.

This can be modified. By default, it will be set to 4326, WGS 1984.

Cannot be modified

Editable

Editable

Geometry formatting type

How the geometry is formatted for each feature. Options are XYZ (fields that represent X, Y, and optionally Z values—XYZ is only applicable to points), WKT (well known text), GeoJson, EsriJson, and shape. This option is only available if the dataset is not a table and not a shapefile.

Editable

Not available

Editable

Editable

Time

The time section outlines how time is represented. The following table outlines the available options, with notes for changes you can make, depending on the input dataset type. Time options are the same for all data types, except where noted.

Time parameters

ParameterDescriptionExample

Time type

The type of the input time. Options are Instant (a single moment in time), Interval (a span of time with a start and end time), and None.

Instant

Time zone

The time zone of the input time. This option is only available if Time Type is not None.

UTC

Name and formatting table for time

This table selects the time field or fields, and outlines how time is defined. Time can use one or more fields to define time, as well as use one or more formats for a single field. By default, the first field with the name "time" will be used as the time field, with an estimate of the time format. If there is a shapefile, the first field of type "date" will be used. If time is of type Interval, there must be a start and end time specified. The time formatting table is only available if Time Type is not None.

Example with a single field used to represent time with two different formats:

  • Name: TimeField Format: yy/MM/dd hh:mm:ss
  • Name: TimeField Format: yyyy-MMM-dd hh:mm:ss

Example with two fields used to represent time :

  • Name: DateField Format: yy/MM/dd
  • Name: TimeField Format: hh:mm:ss

Time formats

The following table outlines how to represent time when you edit a big data file share through ArcGIS Server Manager or directly in a manifest. The examples show how to represent the time January 2, 2016, at 9:45:02.05 PM.

Time formats in big data file shares

SymbolMeaningExample

yy

The year, represented by two digits.

16

yyyy

The year, represented by four digits.

2016

MM

The month, represented numerically.

01 or 1

MMM

The month, represented using three letters.

Jan

MMMM

The month, represented using the complete spelling.

January

dd

The day.

02 or 2

HH

The hour when using a 24-hour day; values range from 0-23.

21

hh

The hour when using a 12-hour day; values range from 1-12.

9

mm

The minute; values range from 0-59.

45

ss

The second; values range from 0-59.

02

SSS

The millisecond; values range from 0-999.

50

a

The AM/PM marker.

PM

epoch_millis

The time in milliseconds from epoch.

1509581781000

epoch_seconds

The time in seconds from epoch.

1509747601

Z

The time zone offset expressed in hours.

-0100 or -01:00

ZZZ

The time zone offset expressed using IDs.

America/Los_Angeles

''

Use single quotes to add text that doesn't represent a value outlined in this table.

'T'

The following table shows examples for different formats of the same date, January 2, 2016, at 9:45:02.05 PM:

Time format examples

Input dateDate format

01/02/2016 9:45:02PM

MM/dd/yyyy hh:mm:ssa

Jan02-16 21:45:02

MMMdd-yy HH:mm:ss

January 02 2016 9:45:02.050PM

MMMM dd yyyy hh:mm:ss.SSSa

01/02/2017T9:45:14:05-0000

MM/dd/yyyy'T'HH:mm:ssZ

Dataset format

The dataset format section outlines the format the data is in. Data may be in one of the following formats:

  • Shapefile (.shp)
  • Delimited file (for example .csv)
  • Parquet file
  • ORC file

The available parameters differ, depending on the dataset. For shapefiles, ORC and parquet files, the only parameter is the file type, which cannot be modified. If the input dataset is a delimited file, there will be multiple parameters that can be modified. To modify values for a delimited file, use a hints file and regenerate the manifest.. These are outlined in the following table:

Dataset formats

ParameterDescription

File extension

Lists the file type extension on the input dataset. Common formats are .csv and .txt. Modify this information for a delimited file using a hints file.

Field delimiter

Determines the delimiter for each field. Common formats are , and ;. Modify this information for a delimited file using a hints file.

Record terminator

Determines the terminator for each row of data. Common formats are \n and \t. Modify this information for a delimited file using a hints file.

Quote character

Determines the character used for quotes. Modify this information for a delimited file using a hints file.

Has header row

A Boolean value that determines if the input table included a header row. If a header row is included, the headers will be used for the field names. Field name information is predicting geometry and time fields. Set header rows using the hints file.

Encoding

The type of encoding used on the file. By default, this will be UTF-8. This is set with a hints file.

Big data file share output template editing parameters

The big data file share output template editor comprises the following four sections:

  • Output template selector
  • Geometry formatting
  • Time formatting
  • Dataset format

Dataset selector

A big data file share is optionally composed of one or more templates. The number of templates is determined by the different formats to which you want to write results. When you open the output template manager, you can view the templates that have been successfully registered in your big data file share. When you select a template from the drop-down menu, the template parameters will be populated with the output formatting information. If you want to add a new template, select the Add template option, and select the type and name of the new template. If you want to delete a template, select it from the template selector, and select Delete template. You can modify an existing template by selecting it, and modifying any of the sections below as needed.

Note:

The input big data file shares have a fields section. The output templates do not have a fields section, since the resulting fields are determined by the GeoAnalytics Tools creating the result. ORC only supports field names that include the Basic Latin alphabet and numeric characters. All other characters in a field name will be replaced with an underscore.

Geometry

The geometry section lists how you would like the output geometry to be formatted of each geometry type (point, line, polygon). There are two parts to determining the output geometry:

  • The spatial reference—You can leave it empty, and it will use the tool results (default). Optionally provide a WKID or WKT string, and all results will be projected to that spatial reference. This value is shared across all output geometries.
  • The geometry formatting type and fields. This is described in more detail below.
For each template, you can define how you want to format the geometry of the dataset, as well as the field names that represent geometry. Depending on the dataset type (delimited files, shapefiles, ORC, or parquet), you can output results in different formats. The following table outlines those formats:

Output geometry formats

Geometry typeOutput FieldsDelimited filesShapefilesORC filesParquet files

XYZ—An X, Y, and optionally Z field. This option is only available for points.

By default, three new fields will be created named X, Y, and Z. You can optionally change these field names.

YesYesYes

WKT

By default, one new field named Geometry will be created. You can optionally change the output field names.

YesYesYes

GeoJSON

By default, one new field named Geometry will be created. You can optionally change the output field names.

YesYesYes

EsriJSON

By default, one new field named Geometry will be created. You can optionally change the output field names.

YesYesYes

SHP

By default, one new field named Geometry will be created. You can optionally change the output field names.

Yes

WKB

By default, one new field named Geometry will be created. You can optionally change the output field names.

YesYes

Shape Buffer

By default,one new field named Geometry will be created. You can optionally change the output field names.

YesYes

Time

The time section outlines how output time is represented. Formatting time requires the following information:

  • Formatting for both instants and intervals.
  • The field names to which time will be written.
  • The format (String or Date) that time will be written as. Note that delimited files can only be formatted with string.
  • For intervals, which fields represent the start and end time.

Time formatting is the same as for input big data files. See Time formats in a big data file share.

Dataset format

The dataset format section outlines the output format to which the data will be written. Data may be in one of the following formats:

  • Shapefile (.shp)
  • Delimited file (for example .csv)
  • Parquet file
  • ORC file

The available parameters differ, depending on the dataset. For shapefiles, ORC, and parquet files, the only parameter is the file type, which cannot be modified. If the input dataset is a delimited file, there will be multiple parameters that can be modified in ArcGIS Server Manager. These are outlined in the following table:

Dataset formats

ParameterDescription

File extension

Extensions are never applied to an output dataset.

Field delimiter

Determines the delimiter for each field. Common formats are , and ;.

Record terminator

The terminator for each row of data cannot be set. For Windows, the terminator is \r\n, for Linux, it's \n .

Quote character

Determines the character used for quotes.

Has header row

A Boolean value that determines if the output table will include a header row representing the field names. By default, this is true.

Encoding

This will always be UTF-8.