Skip To Content

Get started with big data file shares

About big data file shares

A big data file share is an item created in your portal that references feature data (points, polylines, polygons, or tabular data) on a location available to your ArcGIS GeoAnalytics Server. The big data file share item in your portal allows you to browse for your registered data from ArcGIS GeoAnalytics Server tools. Big data file shares can reference the following data sources:

  • File share—A directory of datasets on a local disk or network share.
  • HDFS—A Hadoop Distributed File System (HDFS) directory of datasets.
  • Hive—Metastore databases.
  • Cloud store—An Amazon Web Services (AWS) Simple Storage Service (S3) bucket or Microsoft Azure Blob container containing a directory of datasets.

Note:

A big data file share is only available for use if the portal administrator has enabled GeoAnalytics Server. To learn more about enabling GeoAnalytics Server, see Set up ArcGIS GeoAnalytics Server.

There are several benefits to using a big data file share common to all data sources. You can keep your data in an accessible location until you are ready to perform analysis. A big data file share accesses the data when the analysis is run, so you can continue to add more data to an existing dataset in your big data file share without having to re-register or publish your data. You can also modify the manifest to remove, add, or update datasets in the big data file share. Big data file shares are extremely flexible in how time and geometry can be defined, and allow for multiple time formats on a single dataset. Big data file shares also allow you to partition your datasets while still treating multiple partitions as a single dataset.

Note:

Big data file shares are only accessed when you run GeoAnalytics Tools. This means that you can only browse and add big data files to your analysis; you cannot visualize the data on a map.

Big data file shares are one of several ways GeoAnalytics Tools can access your data. See Use the GeoAnalytics Tools in Map Viewer for a list of possible GeoAnalytics Tools data inputs.

The following file types are supported as datasets in big data file shares:

  • Delimited files (such as .csv, .tsv, and .txt)
  • Shapefiles (.shp)
  • Parquet files (.gz.parquet)
  • ORC files (orc.crc)

Prepare your data to be registered as a big data file share

File shares and HDFS

To prepare your data for a big data file share, you need to format your datasets as subfolders under a single parent folder that will be registered. In this parent folder you register, the names of the subfolders represent the dataset names. If your subfolders contain multiple folders or files, all of the contents of the top-level subfolders are read as a single dataset, and must share the same schema. The following is an example of how to register the folder FileShareFolder that contains three datasets, named Earthquakes, Hurricanes, and GlobalOceans. When you register a parent folder, all subdirectories under the folder you specify are also registered with the GeoAnalytics Server. Always register the parent folder (for example, \\machinename\FileShareFolder) that contains one or more individual dataset folders.

Example of a big data file share that contains three datasets: Earthquakes, Hurricanes, and GlobalOceans.

|---FileShareFolder                 < -- The top-level folder is what is registered as a big data file share
   |---Earthquakes                  < -- A dataset "Earthquakes", composed of 4 csvs with the same schema
      |---1960
         |---01_1960.csv
         |---02_1960.csv
      |---1961
         |---01_1961.csv
         |---02_1961.csv
   |---Hurricanes                   < -- The dataset "Hurricanes", composed of 3 shapefiles with the same schema
      |---atlantic_hur.shp
      |---pacific_hur.shp
      |---otherhurricanes.shp
   |---GlobalOceans                 < -- The dataset "GlobalOceans", composed of a single shapefile
      |---oceans.shp

This same structure is applied to file shares and HDFS, although the terminology differs. In a file share, there is a top-level folder or directory, and datasets are represented by the subdirectories. In HDFS, the file share location is registered and contains datasets. The following table outlines the differences:

File shareHDFS

Big data file share location

A folder or directory

An HDFS path

Datasets

Top-level subfolders

Datasets within the HDFS path

Once your data is organized as a folder with dataset subfolders, make your data accessible to your GeoAnalytics Server by following the steps in Make your data accessible to ArcGIS Server and register the dataset folder.

Hive

In Hive, all tables in a database are recognized as datasets in a big data file share. In the following example, there is a metastore with two databases, default and CityData. When registering a Hive big data file share through ArcGIS Server with your GeoAnalytics Server, only one database can be selected. In this example, if the CityData database was selected, there would be two datasets in the big data file share, FireData and LandParcels.

|---HiveMetastore                 < -- The top-level folder is what is registered as a big data file share
   |---default                    < -- A database
      |---Earthquakes
      |---Hurricanes
      |---GlobalOceans
   |---CityData				               < -- A database that is registered (specified in Server Manager)
      |---FireData
      |---LandParcels

Cloud stores

There are three steps to registering a big data file share of type cloud store.

Prepare your data

To prepare your data for a big data file share in a cloud store, format your datasets as subfolders under a single parent folder.

The following is an example of how to structure your data. This example registers the parent folder, FileShareFolder, which contains three datasets: Earthquakes, Hurricanes, and GlobalOceans. When you register a parent folder, all subdirectories under the folder you specify are also registered with GeoAnalytics Server.

Example of a how to structure data in a cloud store that will be used as a big data file share. This big data file contains three datasets: Earthquakes, Hurricanes, and GlobalOceans.

|---Cloud Store                          < -- The cloud store being registered
   |---Container or S3 Bucket Name       < -- The container (Azure) or bucket (Amazon) being registered as part of the cloud store
      |---FileShareFolder                < -- The parent folder that is registered as the 'folder' during cloud store registration
         |---Earthquakes                 < -- The dataset "Earthquakes", composed of 4 csvs with the same schema
            |---1960
               |---01_1960.csv
               |---02_1960.csv
            |---1961
               |---01_1961.csv
               |---02_1961.csv
         |---Hurricanes                  < -- The dataset "Hurricanes", composed of 3 shapefiles with the same schema
            |---atlantic_hur.shp
            |---pacific_hur.shp
            |---otherhurricanes.shp
         |---GlobalOceans                < -- The dataset "GlobalOceans", composed of 1 shapefile
            |---oceans.shp

Register the cloud store with your GeoAnalytics Server

Connect to your GeoAnalytics Server site from ArcGIS Server Manager to register a cloud store. When you register a cloud store, you must include an Azure container name or an AWS S3 bucket name. It is recommended to additionally specify folder within the container or bucket. The specified folder is composed of subfolders, and each represents an individual dataset. Each dataset is composed of all the contents of the subfolder.

Register the cloud store as a big data file share

Follow these steps to register the AWS S3 or Azure cloud store you created in the previous section as a big data file share:

  1. Sign in to your GeoAnalytics Server site from ArcGIS Server Manager.

    You can sign in as a publisher or administrator.

  2. Go to Site > Data Stores and choose Big Data File Share from the Register drop-down list.
  3. Provide the following information in the Register Big Data File Share dialog box:
    1. Type a name for the big data file share.
    2. Choose Cloud Store from the Type drop-down list.
    3. Choose the name of your cloud store from the Cloud Store drop-down list.
    4. Click Create to register your cloud store as a big data file share.

You now have a big data file share and manifest for your cloud store. The big data file share item in your portal points to a big data catalog service in the GeoAnalytics Server.

Register your big data file share

To register a file share, HDFS, or Hive cloud store as a big data file share, connect to your GeoAnalytics Server site through ArcGIS Server Manager. See Register your data with ArcGIS Server using Manager in the ArcGIS Server help for details on the necessary steps.

Tip:

Steps for registering a cloud store as a big data file share were covered in the previous section.

When a big data file share is registered, a manifest is generated that outlines the format of the datasets within your share location, including the fields representing the geometry and time. A big data file share item is created in your portal that points to a big data catalog service in the GeoAnalytics Server where you registered the data. To learn more about big data catalog services, see the Big Data Catalog Service documentation in the ArcGIS Services REST API help.

Modify a big data file share

When a big data catalog service is created, a manifest is automatically generated and uploaded to the GeoAnalytics Server site where you registered the data. The process of generating a manifest may not always correctly estimate the fields representing geometry and time, and you may need to apply edits. To edit a manifest, follow the steps in Edit big data file shares in Manager. To learn more about the big data file share manifest, see Understanding the big data file share manifest in the ArcGIS Server help.

Run analysis on a big data file share

You can run analysis on a dataset in a big data file share through any clients that support GeoAnalytics Server, which include the following:

  • ArcGIS Pro
  • Map Viewer
  • ArcGIS REST API
  • ArcGIS API for Python

To run your analysis on a big data file share through ArcGIS Pro or Map Viewer, select the GeoAnalytics Tools you want to use. For the input to the tool, browse to where your data is located under Portal in ArcGIS Pro or on the Browse Layers dialog box in Map Viewer. Data will be in My Content if you registered the data yourself. Otherwise, look in your Groups or All Portal. Note that a big data file share layer selected for analysis will not be displayed in the map.

Note:

Make sure you are signed in with a portal account that has access to the registered big data file share. You can search your portal with the term bigDataFileShare* to quickly find all the big data file shares you can access.

To run analysis on a big data file share through the ArcGIS REST API, use the big data catalog service URL as the input. This will be in the format {"url":" https://webadaptorhost.domain.com/webadaptorname/rest/DataStoreCatalogs/bigDataFileShares_filesharename/BigDataCatalogServer/dataset"}. For example, with a machine named example, a domain named esri, a Web Adaptor named server, a big data file share named MyData, and a dataset named Earthquakes, the URL would be: {"url":" https://example.esri.com/server/rest/DataStoreCatalogs/bigDataFileShares_MyData/BigDataCatalogServer/Earthquakes"}. To learn more about input to big data analysis through REST, see the Feature Input topic in the ArcGIS Services REST API documentation.