This tutorial takes you through the steps for setting up data to create a a big data file share. A big data file share is an item created in your portal that references feature data (tables, points, polylines, and polygons) on a location available to your GeoAnalytics Server. The big data file share item in your portal allows you to browse for your registered data so that you can run GeoAnalytics Tools on your datasets. Once you've created a big data file share, you'll consume the data using the Aggregate Points tool. In this tutorial, you will download a dataset of taxi cab drop-off and pick-up locations and use GeoAnalytics Tools to determine where taxi drop-offs occur more frequently.
Be sure your ArcGIS Enterprise administrator has configured GeoAnalytics Server. You'll need to obtain the ArcGIS Server Manager URL from the administrator so you can access the GeoAnalytics Server. Learn more about setting up a Set up ArcGIS GeoAnalytics Server.
Prepare the data
To download and prepare the data used in this example, follow these steps:
- Create a folder called BigDataExample in a location available to your GeoAnalytics Server. Within the folder BigDataExample, create a folder called NYCTaxi.
- Go to http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml and download yellow taxi data from January and February 2014 to the folder BigDataExample > NYCTaxi.
Create a big data file share
Once you've saved the data in a location accessible to all GeoAnalytics Server machines, register it with your GeoAnalytics Server as a big data file share through your GeoAnalytics Server Manager . A big data file share will create a big data catalog service, which can be consumed in GeoAnalytics Server tools. To create the big data file share, follow these steps:
- Sign in to GeoAnalytics Server Manager. The URL is in the format https://gisserver.domain.com:6443/arcgis/manager. If you do not know this URL, request it from your administrator.
- Click Site > Data Stores and select Register a big data file share.
- Accept the default of type File Share, type in a unique name and the path to your folder BigDataExample (for example, \\sharedLocation\BigDataExample for Windows or /sharedLocation/BigDataExample for Linux), and click Create. This creates a big data file share data store. This corresponds to a big data file share item in your portal, with an underlying big data catalog service available through a URL in the format https://gisserver.domain.com:6443/arcgis/rest/services/DataStoreCatalogs/bigDataFileShares_FileShareName/BigDataCatalogServer, where FileShareName is determined by what you named the data store when registering. In this example, the big data file share has one dataset, NYCTaxi, named after the folder in your big data file share.
Edit a big data file share
This dataset has multiple date and time fields. You'll inspect the dataset in the manifest to make sure that you're using the correct fields. To edit and view the datasets in the manifest, click the pencil icon next to the big data file share in Server Manager. When the manifest is first generated, the geometry and time parameters use the pick-up locations. The pick-up location fields were selected by the manifest generation process. For this tutorial you are interested in running analysis on the drop-off locations.
When the manifest is generated, a best guess is applied to find fields used to represent geometry and time.
In this tutorial, you will modify the manifest to use the drop-off time and drop-off locations. This means that the analysis will aggregate the drop-off locations instead of the pick-up locations. Either set of geometry (pick-up or drop-off) can be used for analysis. The correct one to use depends on what you are trying to solve. These changes will be made using the big data file share dataset editor.
This can also be completed by downloading the manifest, editing, and uploading the edited manifest. To learn more about editing the manifest itself see: Understanding the big data file share manifest.
- Select the pencil button beside your big data file share to edit the big data file share manifest after it has been generated.
- Under Dataset select the NYCTaxi dataset.
- The Geometry section shows that the fields currently used to represent X and Y values are pickup_longitude and pickup_latitude. Change the value of Field used to represent X value from pickup_longitude to dropoff_longitude. Change the value of Field used to represent Y value from pickup_latitude to dropoff_latitude.
- The Time section shows that the field currently used to represent time values aretpep_pickup_datetime with the format yyyy-MM-dd HH:mm:ss. Change the time field from tpep_pickup_datetime to tpep_dropoff_datetime.
- Click the Save button to save the changes to your big data file share.
Run analysis on your taxi data through Portal for ArcGIS
Once the data has been registered with your GeoAnalytics Server and the big data file share item has been created in your portal, you can browse to and run a GeoAnalytics Server tool on the item.
Data that's registered with your GeoAnalytics Server is not uploaded to your server, it's only registered with the GeoAnalytics Server and uses a manifest to define the schema.
- Log in to your portal. The URL is in the format https://webadaptorhost.domain.com/arcgis/home, where arcgis is the name of the web adaptor registered with your portal. Go to Content. In your Content table, you'll see the big data file share item you just created.
- Click Map to go to Map Viewer.
- Click the Analysis button. If you have both feature and raster analysis available, click Feature Analysis, and click GeoAnalytics Tools > Summarize Data > Aggregate Points.
- Running the Aggregate Points tool allows you to aggregate the points into polygons or bins of a specified size to gain a better understanding of the data. Because you don't have a polygon dataset to aggregate into, you'll aggregate into bins in both space and time. To add the New York City taxi cab dataset as the layer to aggregate into, choose Browse Layers for the first tool parameter. On the dialog box that appears, choose Content and browse to your New York City taxi cab dataset. Choose the layer and click Add Layer.
- Aggregate into square bins with a size of 1 kilometer.
- Since the data is time-enabled, you can apply time stepping. From downloading the data, you know that there are two months' of data. In this tutorial, examine the first week of each month. To do this, set Time step interval to 1 week, How often to repeat the time step to 1 month, and time to align time steps to to January 1st 2017, at 12:00 AM.
- Select statistics of interest; some examples are the Mean of the total_amount, or the Variance of the Trip Distance.
- Set the spatial reference to a local New York projection. Click the gear button to access the analysis settings. Select As specified for the Processing coordinate system and select the globe to browse for UTM Zone 18N by clicking on Spatial References > PCS > UTM WGS 1984 UTM Zone 18 N and select OK and APPLY. Zoom to the New York City region, make sure that Use current map extent is checked and run the analysis. The analysis is being run on the machines in your GeoAnalytics Server. When the analysis is complete, results will be added to your map. Results will be square polygons representing the count of taxi drop-off locations in each polygon as well as the additional statistics you calculated.