Big data file shares are registered as a data store through ArcGIS Server Manager on your ArcGIS GeoAnalytics Server. A big data file share requires a manifest to outline the schema of the data, as well as the fields and formats that represent geometry and time in a dataset. The manifest is automatically generated when you register a big data file share. You may need to make modifications if there are any changes to your data, or if the manifest generation was unable to determine all the information needed — for example, if the automatically-generated manifest did not select the correct field for the geometry or time.
You can view and edit the datasets and manifest information through ArcGIS Server Manager on your ArcGIS GeoAnalytics Server .
Edit a big data file share
Once you have registered a big data file share, you can view and edit attributes and settings for that item's registered datasets by opening the big data file share manifest editor.
For example, you may want to verify the number of datasets within a registered file share. If, in doing so, you do not see the expected number of datasets in the registered file share, you should check whether the registered location contains valid datasets.
You may also want to review dataset schemas for a registered big data file share. You can modify a selected dataset's schema by updating its geometry, time definition, and field names in its associated manifest resource.
On the advanced tab of the big data file share manifest editor, you can upload a hints file to provide information about a dataset, such as the presence or absence of a header row, encoding, field delimiter, or record terminator. Regenerating the manifest after uploading a hints file will use the information provided to generate the manifest.
Optionally, you can download the manifest, edit it, and upload the edited manifest file.
Edit big data file share datasets
In the big data file share manifest editor, you can view a selected big data file share and the datasets that have been successfully registered within it. When selecting a dataset from the editor drop-down menu, the corresponding parameters are populated. For details about each option on this dialog box, see editing parameters in big data file shares. To edit dataset parameters, do the following:
- On the Registered Data Stores dialog box, locate the big data file share you want to edit.
- Click the Edit pencil to see details and options for corresponding datasets.
- Click the Datasets tab to show the registered datasets and their corresponding parameters.
- Select a dataset from the drop-down menu to view the information represented in its manifest. Make updates to your dataset properties as needed.
- When you have finished editing dataset properties, click Save.
Edit a big data file share manifest or hints file
On the Advanced tab of the big data file share editor, you can edit the associated manifest or hints file by choosing its respective tab. If you upload a manifest, it will overwrite any changes you have made to your big data file share manifest in the editor, and replace the current manifest. To learn more about the big data file share manifest, see Understanding a big data file share manifest. To learn more about using a hints file, see Understanding the hints file. To edit a big data file share manifest or hints file, do the following:
- On the Registered Data Stores dialog box, locate the big data file share you want to modify.
- Click the Edit pencil to see options for modifying the manifest resource.
- Click the Advanced tab.
- From the Advanced tab, choose the Manifest or Hints tab, depending on which you are modifying.
- To download the manifest file, click Manifest > Download.
- To download the hints file, click Hints > Download.
- Use a text editor to modify and save changes locally to the downloaded.json manifest file or .dat hints file.
Tip:
The default file format for the hints file is .dat. Once you've downloaded the file, you can change its extension to .txt and edit the file. - To upload an edited file, click the Edit pencil for the big data file share you want to modify.
- To edit the manifest, click Advanced > Manifest > Upload and browse to the updated .json file.
- To edit the hints file, click Advanced > Hints > Upload and browse to the updated .txt file.
- Click Upload.
If you upload a hints file, be sure to regenerate the manifest. When you regenerate a manifest, only datasets with hints or new datasets will be updated, and changes made to any other datasets not in the hints file will remain the same.
Regenerate the manifest for a big data file share
After a big data file share is created and a manifest has been generated, a regenerate manifest button appears for each entry on the Registered Data Stores dialog box.
You can regenerate a manifest if you have added new data or if you have uploaded a hints file using the edit resource. The hints file provides specifications that are used when regenerating the manifest.
Note:
When a manifest is regenerated, it will update the manifest for existing datasets that have a hints file or new datasets. Any edits you have made to the manifest will be overwritten with the rules defined in the hints file.Big data file share editing parameters
The big data file share editor comprises the following five sections:
- Dataset selector
- Fields
- Geometry
- Time
- Dataset format
It is recommended to use a hints file before editing your data if manifest generation did not correctly determine field names, encoding, field delimiters, or quote characters.
Dataset selector
A manifest is composed of one or more datasets. The number of datasets is dependent on the number of folders in your big data file share location. When you open the manifest manager, you can view the datasets that have been successfully registered in your big data file share. When you select a dataset from the drop-down menu, the dataset parameters will be populated with the dataset information.
If you expected to find more datasets in your manifest or are missing any, do the following:
- Verify that you correctly registered the top-level folder. For more information, see Register a data store through ArcGIS Server Manager.
- Check that your input data is in an allowable format, such as a collection of delimited files, shapefiles, parquet, or ORC.
- Ensure that the schema of your input dataset of interest is consistent for a collection of files (all files in a single dataset must have the same fields).
Fields
The fields section lists all of the fields in a dataset. When you select a dataset, you will be able to see the following for each field:
- The name of the field.
- The field type.
The field name and type can be modified for delimited files. If you are modifying more than one field name, it is recommended to use a hints file.
If the input dataset is a delimited file, there will be multiple parameters that can be modified in the manifest in Manager.
Geometry
The geometry section lists the type of geometry, and how it is represented. The following table outlines the available options, with notes for changes you can make depending on the input dataset type:
Geometry parameters
Parameter | Description | Delimited files | Shapefiles | ORC files | Parquet files |
---|---|---|---|---|---|
Geometry | The Geometry type. Options are Point, Polyline, Polygon, or None. If there is no geometry, the input is a table. | Editable | Cannot be modified | Editable | Editable |
Spatial reference (WKID/WKT) | The spatial reference of the dataset. This option is only shown if the dataset is not a table. | This can be modified. By default, it will be set to 4326, WGS 1984. | Cannot be modified | Editable | Editable |
Geometry formatting type | How the geometry is formatted for each feature. Options are XYZ (fields that represent X, Y, and optionally Z values—XYZ is only applicable to points), WKT (well known text), GeoJson, EsriJson, and shape. This option is only available if the dataset is not a table and not a shapefile. | Editable | Not available | Editable | Editable |
Time
The time section outlines how time is represented. The following table outlines the available options, with notes for changes you can make depending on the input dataset type. Time options are the same for all data types, except where noted.
Time parameters
Parameter | Description | Example |
---|---|---|
Time type | The type of the input time. Options are Instant (a single moment in time), Interval (a span of time with a start and end time), and None. | Instant |
Time zone | The time zone of the input time. This option is only available if Time Type is not None. | UTC |
Name and formatting table for time | This table selects the time field or fields, and outlines how time is defined. Time can use one or more fields to define time, as well as use one or more formats for a single field. By default, the first field with the name "time" will be used as the time field, with an estimate of the time format. If there is a shapefile, the first field of type "date" will be used. If time is of type Interval, there must be a start and end time specified. The time formatting table is only available if Time Type is not None. | Example with a single field used to represent time with two different formats:
Example with two fields used to represent time :
|
Time formats
The following table outlines how to represent time when you edit a big data file share though ArcGIS Server Manager or directly in a manifest. The examples show how to represent the time January 2nd, 2016 at 9:45:02.05 PM.
Time formats in big data file shares
Symbol | Meaning | Example |
---|---|---|
yy | The year, represented by two digits. | 16 |
yyyy | The year, represented by four digits. | 2016 |
MM | The month, represented numerically. | 01 or 1 |
MMM | The month, represented using three letters. | Jan |
MMMM | The month, represented using the complete spelling. | January |
dd | The day. | 02 or 2 |
HH | The hour when using a 24-hour day; values range from 0-23. | 21 |
hh | The hourwhen using a 12-hour day; values range from 1-12. | 9 |
mm | The minute; values range from 0-59. | 45 |
ss | The second; values range from 0-59. | 02 |
SSS | The millisecond; values range from 0-999. | 50 |
a | The AM/PM marker. | PM |
epoch_millis | The time in milliseconds from epoch. | 1509581781000 |
epoch_seconds | The time in seconds from epoch. | 1509747601 |
Z | The time zone offset expressed in hours. | -0100 or -01:00 |
ZZZ | The time zone offset expressed using IDs. | America/Los_Angeles |
The following table shows examples for different formats of the same date, January 2nd, 2016 at 9:45:02.05 PM:
Time format examples
Input date | Date format |
---|---|
01/02/2016 9:45:02PM | MM/dd/yyyy hh:mm:ssa |
Jan02-16 21:45:02 | MMMdd-yy HH:mm:ss |
January 02 2016 9:45:02.050PM | MMMM dd yyyy hh:mm:ss.SSSa |
01/02/2017T9:45:14:05-0000 | MM/dd/yyyy'T'HH:mm:ssZ |
Dataset format
The dataset format section outlines the format the data is in. Data may be in one of the following formats:
- Shapefile (.shp)
- Delimited file (for example .csv)
- Parquet file
- ORC file
The available parameters differ depending on the dataset. For shapefiles, ORC and parquet files, the only parameter is the file type, which cannot be modified. If the input dataset is a delimited file, there will be multiple parameters that can be modified in the manifest in Manager. These are outlined in the following table:
Dataset formats
Parameter | Description |
---|---|
File extension | Lists the file type extension on the input dataset. Common formats are .csv and .txt. This information can be included in the hints file. |
Field delimiter | Determines the delimiter for each field. Common formats are , and ;. This information can be included in the hints file. |
Record terminator | Determines the terminator for each row of data. Common formats are \n and \t. This information can be included in the hints file. |
Quote character | Determines the character used for quotes. This information can be included in the hints file. |
Has header row | A Boolean that determines if the input table included a header row. If a header row is included, the headers will be used for the field names. Field name information is predicting geometry and time fields. Headers can be set using the hints file. |
Encoding | The type of encoding used on the file. By default, this will be UTF-8. This can be set in the hints file. |