Note:
At 10.9.1 or higher, register a big data file share through your portal contents page. This is the recommend way to register your big data file shares. Only use Server Manager for editing if your big data file share was created using Server Manager, and you haven't replaced it with a big data file share in portal.
A big data file share is an item created in your portal that references a location available to your ArcGIS GeoAnalytics Server. You can use the big data file share location as an input and output to feature data (points, polylines, polygons, and tabular data) of GeoAnalytics tools. When you create a big data file share through your portal contents page, at least two items are created in your portal:
- A data store (big data file share) item
- A big data file share item
- A data store (cloud store) item, if you're registering a cloud store for a big data file share
Note:
A big data file share is only available if the portal administrator has enabled GeoAnalytics Server. To learn more about enabling GeoAnalytics Server, see Set up ArcGIS GeoAnalytics Server.
Big data file shares
There are several benefits to using a big data file share:
- You can keep your data in an accessible location until you are ready to perform analysis. A big data file share accesses the data when the analysis is run, so you can continue to add data to an existing dataset in your big data file share without having to reregister or publish your data.
- You can also modify the manifest to remove, add, or update datasets in the big data file share.
- Big data file shares are extremely flexible in how time and geometry can be defined and allow for multiple time formats on a single dataset.
- Big data file shares also allow you to partition your datasets while still treating multiple partitions as a single dataset.
- Using big data file shares for output data allows you to store your results in formats that you may use for other workflows, such as a parquet file for further analysis or storage.
Note:
Big data file shares are only accessed when you run GeoAnalytics Tools. This means that you can only browse and add big data files to your analysis; you cannot visualize the data on a map.
Big data file shares can reference the following input data sources:
- File share—A directory of datasets on a local disk or network share.
- Apache Hadoop Distributed File System (HDFS)—An HDFS directory of datasets.
- Apache Hive—Hive metastore databases.
- Cloud store—An Amazon Simple Storage Service (S3) bucket, Microsoft Azure Blob container, or Microsoft Azure Data Lake Gen2 (ArcGIS Server Administrator Directory only) store containing a directory of datasets.
When writing results to a big data file share, you can use the following outputs for GeoAnalytics Tools:
- File share
- HDFS
- Cloud store
The following file types are supported as datasets for input and output in big data file shares:
- Delimited files (such as .csv, .tsv, and .txt)
- Shapefiles (.shp)
- Parquet files (.parquet)
Note:
Only unencrypted parquet files are supported.
- ORC files (.orc)
Big data file shares are one of several ways GeoAnalytics Tools can access your data and are not a requirement for GeoAnalytics Tools. See Use the GeoAnalytics Tools in Map Viewer Classic for a list of possible GeoAnalytics Tools data inputs and outputs.
You can register as many big data file shares as you need. Each big data file share can have as many datasets as you want.
The table below outlines some important terms when talking about big data file shares.
Term | Description |
---|---|
Big data file share | A location registered with your GeoAnalytics Server to be used as dataset input, output, or both input and output to GeoAnalytics Tools. |
Big data catalog service | A service that outlines the input datasets and schemas and output template names of your big data file share. This is created when your big data file share is registered, and your manifest is created. To learn more about big data catalog services, see the Big Data Catalog Service documentation in the ArcGIS Services REST API help. |
Big data file share item | An item in your portal that references the big data catalog service. You can control who can use your big data file share as input to GeoAnalytics by sharing this item in portal. |
Manifest | A JSON file that outlines the datasets available and the schema for inputs in your big data file share. The manifest is automatically generated when you register a big data file share and can be modified by editing or using a hints file. A single big data file share has one manifest. |
Output templates | One or more templates that outline to file type and optional formatting when writing results to a big data file share. For example, a template could specify that results are written to a shapefile. A big data file share can have none, one, or more output templates. |
Big data file share type | The type of locations you are registering. For example, you could have a big data file share or type HDFS. |
Big data file share dataset format | The format of the data you are reading or writing. For example, the file type may be shapefile. |
Hints file | An optional file that you can use to assist in generating a manifest for delimited files used as an input. |
Prepare your data to be registered as a big data file share
To use your datasets as inputs in a big data file share, ensure that your data is correctly formatted. See below for the formatting based on the big data file share type.
File shares and HDFS
To prepare your data for a big data file share, you must format your datasets as subfolders under a single parent folder that will be registered. In this parent folder you register, the names of the subfolders represent the dataset names. If your subfolders contain multiple folders or files, all of the contents of the top-level subfolders are read as a single dataset and must share the same schema. The following is an example of how to register the folder FileShareFolder that contains three datasets, named Earthquakes, Hurricanes, and GlobalOceans. When you register a parent folder, all subdirectories under the folder
you specify are also registered with the GeoAnalytics Server. Always register
the parent folder (for example, \\machinename\FileShareFolder)
that contains one or more individual dataset folders.
Example of a big data file share that contains three datasets: Earthquakes, Hurricanes, and GlobalOceans.
|---FileShareFolder < -- The top-level folder is what is registered as a big data file share
|---Earthquakes < -- A dataset "Earthquakes", composed of 4 csvs with the same schema
|---1960
|---01_1960.csv
|---02_1960.csv
|---1961
|---01_1961.csv
|---02_1961.csv
|---Hurricanes < -- The dataset "Hurricanes", composed of 3 shapefiles with the same schema
|---atlantic_hur.shp
|---pacific_hur.shp
|---otherhurricanes.shp
|---GlobalOceans < -- The dataset "GlobalOceans", composed of a single shapefile
|---oceans.shp
This same structure is applied to file shares and HDFS, although the terminology differs. In a file share, there is a top-level folder or directory, and datasets are represented by the subdirectories. In HDFS, the file share location is registered and contains datasets. The following table outlines the differences:
File share | HDFS | |
---|---|---|
Big data file share location | A folder or directory | An HDFS path |
Datasets | Top-level subfolders | Datasets within the HDFS path |
Once your data is organized as a folder with dataset subfolders, make your data accessible to your GeoAnalytics Server by following the steps in Make your data accessible to ArcGIS Server and registering the dataset folder or HDFS path through portal.
Hive
Note:
GeoAnalytics Server uses Spark 3.0.1. Hive must be version 2.3.7 or 3.0.0–3.1.2.
If you try and register a big data file share with Hive that is not the correct version, the big data file share registration will fail. If this happens, restart the GeoAnalyticsManagement toolbox in ArcGIS Server Administrator Directory, > services > System > GeoAnalyticsManagement> stop. Repeat steps to start.
In Hive, all tables in a database are recognized as datasets in a big data file share. In the following example, there is a metastore with two databases, default and CityData. When registering a Hive big data file share , only one database can be selected. In this example, if the CityData database was selected, there would be two datasets in the big data file share, FireData and LandParcels.
|---HiveMetastore < -- The top-level folder is what is registered as a big data file share
|---default < -- A database
|---Earthquakes
|---Hurricanes
|---GlobalOceans
|---CityData < -- A database that is registered (specified in Server Manager)
|---FireData
|---LandParcels
Cloud stores
To prepare your data for a big data file share in a cloud store, format your datasets as subfolders under a single parent folder.
The following is an example of how to structure your data. This example registers the parent folder, FileShareFolder, which contains three datasets: Earthquakes, Hurricanes, and GlobalOceans. When you register a parent folder, all subdirectories under the folder
you specify are also registered with GeoAnalytics Server. Example of a how to structure data in a cloud store that will be used as a big data file share. This big data file contains three datasets: Earthquakes, Hurricanes, and GlobalOceans.
|---Cloud Store < -- The cloud store being registered
|---Container or S3 Bucket Name < -- The container (Azure) or bucket (Amazon) being registered as part of the cloud store
|---FileShareFolder < -- The parent folder that is registered as the 'folder' during cloud store registration
|---Earthquakes < -- The dataset "Earthquakes", composed of 4 csvs with the same schema
|---1960
|---01_1960.csv
|---02_1960.csv
|---1961
|---01_1961.csv
|---02_1961.csv
|---Hurricanes < -- The dataset "Hurricanes", composed of 3 shapefiles with the same schema
|---atlantic_hur.shp
|---pacific_hur.shp
|---otherhurricanes.shp
|---GlobalOceans < -- The dataset "GlobalOceans", composed of 1 shapefile
|---oceans.shp
Add a big data file share
To add a big data file share of type folder, HDFS, Hive, or Microsoft Azure Blob storage, Amazon Simple Storage Service (S3) buckets, or S3 compatible buckets cloud store, see Add a big data file share.
Follow these steps to register a Microsoft Azure Data Lake Gen2 cloud store as a big data file share.
Before getting started, ensure that you have and do the following:
- Pick a name for your big data file share. This name will be used throughout for cloud store registration and big data file share registration, unless otherwise noted. Do not use spaces or special characters.
- An Azure Data Lake Gen2
- A shared key for your Azure Data Lake Gen2
- Sign in to your Portal Directory using the URL https://webadaptorhost.domain.com/webadaptorname/sharing/rest.
- Get the Server ID of your GeoAnalytics Server by navigating to the following URL: https://webadaptorhost.domain.com/webadaptorname/sharing/rest/portals/0123456789ABCDEF/servers. Make a note of the GeoAnalytics Server server ID, which will be used in later steps.
- Next, add the Data Lake Gen2 cloud store to your portal. Modify the URL to https://webadaptorhost.domain.com/webadaptorname/sharing/rest/content/users/<username>/addItem, replacing <username> with the username you've signed in as.
- Fill out the Add Item page with the following information:
Type—Data Store
Title—Use the name you've chosen to use throughout.
Format—JSON
For the Text parameter, use the following JSON and update the parameters as outlined:
- <title>—Replace with the chosen name you'll use throughout
- <data lake name>—Replace with the name of your Data Lake in Azure.
- <shared key>—Replace with the shared key for your Data Lake.
- <container name>—Replace with the container your data is stored in.
- <folder name>—Replace with the folder your folders of data are stored in.
{ "path": "/cloudStores/<title>", "type": "cloudStore", "provider": "azuredatalakegen2store", "info": { "isManaged": false, "connectionString": "{\"endpoint\":\"<data lake name>.dfs.core.windows.net\",\"authType\":\"SharedKey\",\"sharedKey\":\"<shared key>\"}", "container": "<container name>", "folder": "<folder name>" } }
- Click Add Item. Make a note of the JSON object that's returned.
It will look similar to this sample:
{ "id": "ae514ea11d0a4a2cb720dd627694b098", "success": true, "folder": "" }
- Open a new tab and go to the following URL: https://webadaptorhost.domain.com/webadaptorname/sharing/rest/portals/self/datastores/addToServer. Fill out the form with the following information:
- DatastoreId—Use the ID returned in the JSON from the previous step.
ServerId—The ID of your GeoAnalytics Server.
Format—JSON
This will return a JSON message of the status. If successful, continue to the next step.
- Sign in to ArcGIS Server Manager on GeoAnalytics Server. Go to Site > Data Stores. Look through the registered data stores, and find the one you've just created and copy the file name.
It will be in the format <chosenname>_ds_<unique key>. You'll use this name in the next step.
- Next, add the big data file share referencing your cloud store to your portal. Access the URL https://webadaptorhost.domain.com/webadaptorname/sharing/rest/content/users/<username>/addItem, replacing <username> with the username you've signed in as.
Fill out the following values in the form:
Type—Data Store
Title—Use the name you've chosen to use throughout.
Format—JSON
For the Text parameter, use the following JSON and update the parameters as outlined:
- <cloud_title>—Use the name from Step 7.
- <title>—Replace with the the same chosen name you've used elsewhere.
{ "info":{ "connectionString":"{\"path\":\"/cloudStores/<cloud_title>\"}", "connectionType":"dataStore" }, "path":"/bigDataFileShares/<title>", "type":"bigDataFileShare" }
- Click Add Item. Make a note of the JSON object that's returned.
It will look similar to this sample:
{ "id": "bk514ea14d0a3a2cb890hh627694b071", "success": true, "folder": "" }
- Return to the following URL: https://webadaptorhost.domain.com/webadaptorname/sharing/rest/portals/self/datastores/addToServer. Fill out the form with the following information:
- DatastoreId—Use the ID returned in the JSON from the previous step.
ServerId—The ID of your GeoAnalytics Server.
Format—JSON
You now have a big data file share item and a cloud store item. Once your big data file share item is created, the GeoAnalytics Server will create a third item, which includes the manifest for your data, which you can modify in your portal contents. This may take a few minutes to create depending on the amount of datasets. To modify your datasets see Manage big data files shares in a portal below.
Manage big data file shares in a portal
Once you have created a big data file share, you can review the datasets in it and the templates that outline how results saved to big data file shares will be written.
Modify a big data file share
When a big data file share item is created, a manifest for the input data is automatically generated and uploaded. The process of generating a manifest may not always correctly estimate the fields representing geometry and time, and you may need to apply edits. To edit a manifest and how datasets are represented, follow the steps in Edit big data file shares.. To learn more about the big data file share manifest, see Big data file share manifest in the ArcGIS Server help.
If you created your big data file share in ArcGIS Server using Manager, follow the steps in Edit big data file share manifests in Server Manager.
Modify output templates for a big data file share
When you choose to use the big data file share as an output location, output templates are automatically generated. These templates outline the formatting of output analysis results, such as the file type, and how time and geometry will be registered. If you want to modify the geometry or time formatting, or add or delete templates, you can modify the templates. To edit the output templates, follow the steps in Create, edit, and view output templates. To learn more about output templates, see Output templates in a big data file share.
If you created your big data file share in ArcGIS Server using Manager, follow the steps in Edit big data file share manifests in Server Manager.
Migrate big data file shares created in Server Manager to a portal
Big data file shares created using a portal have many advantages over big data files shares created in Server Manager, for example:
- An improved user experience to make editing datasets easier.
- Simpler experience to register your big data file shares.
- Items are stored and shared using portal credentials
It is recommended that you move existing big data file shares from Server Manager to a portal. In some cases, it is required. In the following cases, you must migrate big data file shares to a portal to continue using them:
- Big data file shares based on a Microsoft Azure Data Lake Gen1 cloud store.
To migrate a big data file share from Server Manager to a portal, ensure that you have the following:
- The credentials and file location of your configured big data file share.
- If applicable, the credentials and file location of your configured cloud store.
- Sign in to Server Manager on your GeoAnalytics Server site.
- Go to Site > Data Stores. Click the edit button on the big data file share you'd like to migrate.
- Go to Advanced > Manifest. Click the Download button to save the manifest.
- If you have any hints, complete the same steps for hints. Click HintsDownload to save your hints file. Rename your file extension from .dat to .txt.
- If you have output templates under the AdvancedOutput Templates section, copy the text and save it in a text file.
- Create a big data file share in portal content using the same type and input location as was previously used.
If you don't know the credentials, your administrator can find them in Server Administrator using the decrypt=true option on the big data files share and cloud store items.
- If you are upgrading Microsoft Azure Data Lake Gen1 cloud store to Gen2, use the following steps.
- For any other big data file share type, following the steps in Add a data store item using the same credentials and location as your existing big data file share.
- Once your big data share is created, click Datasets, and turn on the Show advanced option.
- Upload the manifest you saved previously by clicking Upload in the manifest section. Browse to the manifest JSON file that was saved earlier, and click Upload. Click the Sync button so that changes are reflected.
- If you have a hints file to upload, complete the same steps, and upload your hints file under the Show advanced > Hints > Upload option. Click the Sync button so that changes are reflected.
- To upload the output templates, do one of the following:
- Manually add the output templates using the big data file share item Outputs > Add output templates.
- Edit the JSON file of the big data file share item through ArcGIS Server Administrator Directory. This is only recommend if you're familiar with editing JSON files.
You now have a big data file share and manifest for your big data file share item in your portal. You can update your workflows to use and point to this big data file share. When you are confident it's working as expected, delete your original big data file share in Server Manager.
Run analysis on a big data file share
You can run analysis on a dataset in a big data file share through any clients that support GeoAnalytics Server, which include the following:
- ArcGIS Pro
- Map Viewer Classic
- ArcGIS REST API
- ArcGIS API for Python
To run your analysis on a big data file share through ArcGIS Pro or Map Viewer Classic, select the GeoAnalytics Tools you want to use. For the input to the tool, browse to where your data is located under Portal in ArcGIS Pro or on the Browse Layers dialog box in Map Viewer Classic. Data will be in My Content if you registered the data yourself. Otherwise, look in Groups or All Portal. Note that a big data file share layer selected for analysis will not be displayed in the map.
Note:
Ensure that you are signed in with a portal account that has access to the registered big data file share. You can search your portal with the term bigDataFileShare* to quickly find all the big data file shares you can access.
To run analysis on a big data file share through ArcGIS REST API, use the big data catalog service URL as the input. If you created the big data file share in portal, this will be in the format {"url":" https://webadaptorhost.domain.com/webadaptorname/rest/DataStoreCatalogs/bigDataFileShares_filesharename/"}. For example, with a machine named example, a domain named esri, a web adaptor named server, a big data file share named MyData, and a dataset named Earthquakes, the URL is: {"url":" https://example.esri.com/server/rest/DataStoreCatalogs/bigDataFileShares_MyData/Earthquakes_uniqueID"}. If you created the big data file share in Server Manager, this will be in the format {"url":"https://webadaptorhost.domain.com/webadaptorname/rest/DataStoreCatalogs/bigDataFileShares_filesharename/BigDataCatalogServer/dataset"}
To learn more about input to big data analysis through REST, see the Feature input topic in the ArcGIS Services REST API documentation.
Save results to a big data file share
You can run analysis on a dataset (big data file share or other input) and save the results to a big data file share. When you save results to a big data file share, you cannot visualize them. You can do this through the following clients:
- Map Viewer Classic
- ArcGIS REST API
- ArcGIS API for Python
When you write results to a big data file share, the input manifest is updated to include the dataset you just saved. The results you have written to the big data file share are now available as an input for another tool run.