About big data file shares
A big data file share is an item created in your portal that references feature data (points, polylines, polygons, or tabular data) on a location available to your ArcGIS GeoAnalytics Server. The big data file share item in your portal allows you to browse for your registered data from ArcGIS GeoAnalytics Server tools. Big data file shares can reference the following data sources:
- File share—A directory of datasets on a local disk or network share.
- HDFS—A Hadoop Distributed File System (HDFS) directory of datasets.
- Hive—Metastore databases.
- Cloud store—An Amazon Web Services (AWS) Simple Storage Service (S3) bucket or Microsoft Azure Blob container containing a directory of datasets. Cloud stores are available beginning with ArcGIS 10.5.1.
Note:
A big data file share is only available for use if the portal administrator has enabled GeoAnalytics Server. To learn more about enabling GeoAnalytics Server, see Set up ArcGIS GeoAnalytics Server.
There are several benefits to using a big data file share common to all data sources. You can keep your data in your accessible location until you are ready to perform analysis. A big data file share accesses the data when the analysis is run, so that you can continue to add more data to an existing dataset in your big data file share without having to re-register or publish your data. You can also modify the manifest to remove, add, or update datasets in the big data file share. Big data file shares are extremely flexible in how time and geometry can be defined and allow for multiple time formats on a single dataset. Big data file shares also allow you to partition your datasets while still treating multiple partitions as a single dataset.
Note:
Big data file shares are only accessed when you run GeoAnalytics Tools. This means that you can only browse and add big data files to your analysis; you cannot visualize the data on a map.
Big data file shares are one of several ways GeoAnalytics Tools can access your data. See Use the GeoAnalytics Tools in the portal map viewer for a list of possible GeoAnalytics Tools data inputs.
Prepare your data to be registered as a big data file share
File shares and HDFS
To prepare your data for a big data file share, you need to format your datasets as subfolders under a single parent folder that will be registered. In this parent folder that you register, the names of the subfolders represent the dataset names. If your subfolders contain multiple folders or files, all of the contents of the top-level subfolders are read as a single dataset. The following is an example of how to register the folder FileShareFolder that contains three datasets, named Earthquakes, Hurricanes, and GlobalOceans. When you register a parent folder, all subdirectories under the folder
you specify are also registered with the GeoAnalytics Server. Always register
the parent folder (for example, \\machinename\FileShareFolder)
that contains one or more individual dataset folders.
Example of a big data file share that contains three datasets: Earthquakes, Hurricanes, and GlobalOceans.|---FileShareFolder < -- The top-level folder is what is registered as a big data file share
|---Earthquakes < -- A dataset is all files and folders within the top-level subfolder
|---1960
|---01_1960.csv
|---02_1960.csv
|---1961
|---01_1961.csv
|---02_1961.csv
|---Hurricanes
|---atlantic_hur.shp
|---pacific_hur.shp
|---otherhurricanes.shp
|---GlobalOceans
|---oceans.shp
This same structure is applied to file shares and HDFS although the terminology differs. In a file share, there is a top-level folder or directory, and datasets are represented by the subdirectories. In HDFS, the file share location is registered and contains datasets. The following table outlines the differences:
File share | HDFS | |
---|---|---|
Big data file share location | A folder or directory | An HDFS path |
Datasets | Top-level subfolders | Datasets within the HDFS path |
Once your data is organized as a folder with dataset subfolders, make your data accessible to your GeoAnalytics Server by following the steps in Make your data accessible to ArcGIS Server and register the dataset folder.
Hive
In Hive, all tables in a database are recognized as datasets in a big data file share. In the following example, there is a metastore with two databases, default and CityData. When registering a Hive big data file share through ArcGIS Server with your GeoAnalytics Server, only one database can be selected. In this example, if the CityData database was selected, there would be two datasets in the big data file share, FireData and LandParcels.|---HiveMetastore < -- The top-level folder is what is registered as a big data file share
|---default < -- A database
|---Earthquakes
|---Hurricanes
|---GlobalOceans
|---CityData < -- A database that is registered (specified in Server Manager)
|---FireData
|---LandParcels
Cloud stores
There are three steps to registering a big data file share of type cloud store.
Prepare your data
To prepare your data for a big data file share in a cloud store, format your datasets as subfolders under a single parent folder.
The following is an example of how to structure your data. This example registers the parent folder, FileShareFolder, which contains three datasets Earthquakes, Hurricanes, and GlobalOceans. When you register a parent folder, all subdirectories under the folder
you specify are also registered with GeoAnalytics Server. Example of a how to structure data in a cloud store that will be used as a big data file share. This big data file contains three datasets: Earthquakes, Hurricanes, and GlobalOceans.|---Cloud Store < -- The cloud store being registered
|---Container or S3 Bucket Name < -- The container (Azure) or bucket (Amazon) being registered as part of the cloud store
|---FileShareFolder < -- The parent folder that is registered as the 'folder' during cloud store registration
|---Earthquakes < -- The dataset "Earthquakes" composed of 4 csvs
|---1960
|---01_1960.csv
|---02_1960.csv
|---1961
|---01_1961.csv
|---02_1961.csv
|---Hurricanes < -- The dataset "Hurricanes" composed of 3 shapefiles
|---atlantic_hur.shp
|---pacific_hur.shp
|---otherhurricanes.shp
|---GlobalOceans < -- The dataset "GlobalOceans" composed of 1 shapefile
|---oceans.shp
Register the cloud store to your GeoAnalytics Server
Connect to your GeoAnalytics Server site from ArcGIS Server Manager to register a cloud store. When you register a cloud store, you must include an Azure container name or an AWS S3 bucket name, as well as a folder within the container or bucket. The specified folder is composed of subfolders, and each represents an individual dataset. Each dataset is composed of all the contents of the subfolder.
Register the cloud store as a big data file share
How you register the cloud store as a big data file share depends on which cloud storage you use.
Follow these steps to register the AWS S3 cloud store you created in the previous section as a big data file share:
- Sign in to your GeoAnalytics Server site from ArcGIS Server Manager.
You can sign in as a publisher or administrator.
Note:
At GeoAnalytics Server 10.5.1, you cannot register an AWS cloud store using IAM credentials.
- Go to Site > Data Stores and choose Big Data File Share from the Register drop-down list.
- Provide the following information in the Register Big Data File Share dialog box:
- Type a name for the big data file share.
- Choose Cloud Store from the Type drop-down list.
- Choose the name of your AWS cloud store from the Cloud Store drop-down list.
- Click Create to register your cloud store as a big data file share.
You now have a big data file share and manifest for your AWS cloud store. The big data file share item in your portal points to a big data catalog service in the GeoAnalytics Server.
Follow these steps to register the Azure cloud store you created in the last section as a big data file share:
- Sign in to your GeoAnalytics Server site from ArcGIS Server Administrator Directory.
ArcGIS Server Administrator Directory requires you to sign in as an administrator. To connect to your federated GeoAnalytics Server site, you must sign in using a portal token, which requires the portal administrator's credentials, or as the GeoAnalytics Server site's primary site administrator. If you are not a portal administrator or do not have access to the primary site administrator account information, contact your portal administrator to complete these steps for you.
- Go to data > registerItem.
- Copy the following text and paste it into the Item text box. Update the value <bigDataFileShareName> with the name you want for the big data file share and the value <cloudStoreName> with the name you specified for the Azure cloud store when you registered it with your GeoAnalytics Server site.
{ "path": "/bigDataFileShares/<bigDataFileShareName>", "type": "bigDataFileShare", "info": { "connectionString": "{\"path\" : \"/cloudStores/<cloudStoreName>\"}", "connectionType": "dataStore" } }
- Click Register Item.
Once the item is registered, the big data file share appears as a data store in ArcGIS Server Manager.
- Sign in to your GeoAnalytics Server siteGeoAnalytics Server site from ArcGIS Server Manager.
You can sign in as a publisher or administrator.
- Go to Site > Data Stores and click the Regenerate Manifest button next to your new big data file share.
You now have a big data file share and manifest for your Azure cloud store. The big data file share item in your portal points to a big data catalog service in the GeoAnalytics Server.
Register your big data file share
To register a file share, HDFS, or Hive cloud store as a big data file share, connect to your GeoAnalytics Server site through ArcGIS Server Manager. See Register your data with ArcGIS Server using Manager in the ArcGIS Server help for details on the necessary steps.
Tip:
Steps for registering a cloud store as a big data file share were covered in the previous section.
When a big data file share is registered, a manifest is generated that outlines the format of the datasets within your share location, including the fields representing the geometry and time. A big data file share item is created in your portal that points to a big data catalog service in the GeoAnalytics Server where you registered the data. To learn more about big data catalog services, see the Big Data Catalog Service documentation in the ArcGIS Services REST API help.
Modify a big data file share
When a big data catalog service is created, a manifest is automatically generated and uploaded to the GeoAnalytics Server site where you registered the data. The process of generating a manifest may not always correctly estimate the fields representing geometry and time, and you may need to apply edits. To edit a manifest, follow the steps in Edit big data file shares in Manager. To learn more about the big data file share manifest, see Understanding the big data file share manifest in the ArcGIS Server help.
Run analysis on a big data file share
You can run analysis on a dataset in a big data file share through any clients that support GeoAnalytics Server, which include the following:
- ArcGIS Pro
- The Portal for ArcGIS map viewer
- ArcGIS REST API
To run your analysis on a big data file share through ArcGIS Pro or the Portal for ArcGIS map viewer, select the GeoAnalytics Tools you want to use. For the input to the tool, browse to where your data is located under Portal in ArcGIS Pro or on the Browse Layers dialog box in the Portal for ArcGIS map viewer. Data will be in My Content if you registered the data yourself. Otherwise, look in your Groups or All Portal. Note that a big data file share layer selected for analysis will not be displayed in the map.
Note:
Make sure you are signed in with a portal account that has access to the registered big data file share. You can search your portal with the term bigDataFileShare* to quickly find all the big data file shares you can access.
To run analysis on a big data file share through ArcGIS REST API, use the big data catalog service URL as the input. This will be in the format {"url":" https://webadaptorhost.domain.com/webadaptorname/rest/DataStoreCatalogs/bigDataFileShares_filesharename/BigDataCatalogServer/dataset"}. For example, with a machine named example, a domain named esri, a Web Adaptor named server, a big data file share named MyData, and a dataset named Earthquakes, the URL would be: {"url":" https://example.esri.com/server/rest/DataStoreCatalogs/bigDataFileShares_MyData/BigDataCatalogServer/Earthquakes"}. To learn more about input to big data analysis through REST, see the Feature Input topic in the ArcGIS Services REST API documentation.