This tutorial takes you through the steps for setting up data to create a big data file share. A big data file share is an item created in your portal that references feature data (points, polylines, and polygons) on a location available to your GeoAnalytics Server. The big data file share item in your portal allows you to browse for your registered data so that you can run GeoAnalytics Tools on your datasets. Once you've created a big data file share, you'll consume the data using the Aggregate Points tool. In this tutorial, you will download a dataset of taxi cab drop-off and pick-up locations and use GeoAnalytics Tools to determine where taxi drop-offs occur more frequently.
Be sure your ArcGIS Enterprise administrator has configured GeoAnalytics Server. You'll need to obtain the ArcGIS Server Manager URL from the administrator so you can access the GeoAnalytics Server. Learn more about setting up a Set up ArcGIS GeoAnalytics Server.
Prepare the data
To download and prepare the data used in this example, follow these steps:
- Create a folder called BigDataExample in a location available to your GeoAnalytics Server. Within the folder BigDataExample, create a folder called NYCTaxi.
- Go to http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml and download yellow taxi data from January and February 2014 to the folder BigDataExample > NYCTaxi.
Create a big data file share
Once you've saved the data in a location accessible to all GeoAnalytics Server machines, register it with your GeoAnalytics Server as a big data file share through your GeoAnalytics Server Manager . A big data file share will also create a big data catalog service, which can be consumed in GeoAnalytics Server tools. To create the big data file share, follow these steps:
- Sign in to GeoAnalytics Server Manager. The URL is in the format https://gisserver.domain.com:6443/arcgis/manager. If you do not know this URL, request it from your administrator.
- Click Site > Data Stores and select Register a big data file share.
- Accept the default of type File Share, type in a unique name and the path to your folder BigDataExample (for example, \\sharedLocation\BigDataExample, and click Create. This creates a big data file share data store. This corresponds to a big data file share item in your portal, with an underlying big data catalog service available through a URL in the format https://gisserver.domain.com:6443/arcgis/rest/services/DataStoreCatalogs/bigDataFileShares_FileShareName/BigDataCatalogServer, where FileShareName is determined by what you named the data store when registering. In this example, big data file share has one dataset, NYCTaxi, named after the folder in your big data file share.
Edit a big data file share
This dataset has multiple date and time fields. You'll inspect the dataset in the manifest to make sure that you're using the correct fields. To edit and view the datasets in the manifest, click the pencil icon next to the big data file share in Server Manager. When the manifest is first generated, the geometry and time parameters determine the pick-up locations. The pick-up location fields were selected by the manifest generation process. For this tutorial you are interested in running analysis on the drop-off locations.
When the manifest is generated, a best guess is applied to find fields used to represent geometry and time.
In this tutorial, you will modify the manifest to use the drop-off time and drop-off locations. This means that the analysis will aggregate the drop-off locations instead of the pick-up locations. Either set of geometry (pick-up or drop-off) can be used for analysis. The correct one to use depends on what you are trying to solve. These changes will be made using the big data file share dataset editor.
This can also be completed by downloading the manifest, editing, and uploading the edited manifest. To learn more about editing the manifest itself see: Understanding the big data file share manifest.
- Select the pencil button beside your big data file share to edit the big data file share manifest after it has been generated.
- Under Dataset select the NYC Taxi dataset.
- Hover over the information icon beside geometry. The current geometry is determined by the fields pickup_longitude for X and pickup_latitude for Y.
- Select the field pickup_longitude in the field selector. The field contains geometry of the format X. Unselect the Geometry related attributes button.
- Select the field dropoff_longitude. To apply geometry to this field, select the checkbox beside Geometry related attributes button. Specify the format as X.
- Repeat steps 4 and 5, change the geometry field for Y from pickup_latitude to dropoff_latitude. Click Save after completing edits and close the big data file share dialog.
Run analysis on your taxi data through Portal for ArcGIS
Once the data has been registered with your GeoAnalytics Server and the big data file share item has been created in your portal, you can browse to and run a GeoAnalytics tool on the item.
Data that's registered with your GeoAnalytics Server is not uploaded to your server, it's only registered with the GeoAnalytics Server and has a manifest to define the schema.
- Log in to your portal. The URL is in the format https://webadaptorhost.domain.com/arcgis/home, where arcgis is the name of the web adaptor registered with your portal. Go to My Content. In your My Content table, you'll see the big data file share item you just created.
- Click Map to go to the map viewer.
- Click the Analysis button. If you have both feature and raster analysis available, click Feature Analysis, and click GeoAnalytics Tools > Summarize Data > Aggregate Points.
- Running the Aggregate Points tool allows you to aggregate the points into polygons or bins of a specified size to gain a better understanding of the data. Because you don't have a polygon dataset to aggregate into, you'll aggregate into bins in both space and time. To add the New York City taxi cab dataset as the layer to aggregate into, choose Browse Layers for the first tool parameter. On the dialog box that appears, choose My Content and browse to your New York City taxi cab dataset. Choose the layer and click Add Layer.
- Aggregate into square bins with a size of 1 kilometer.
- Since the data is time-enabled, you can apply time slicing. From downloading the data, you know that there are two months' of data. In this tutorial, examine the first week of each month. To do this, set time interval to 1 week, time step to 1 month, and reference time to January 1st, at 12:00 AM.
- Select statistics of interest; some examples are the Mean of the total_amount, or the Variance of the Trip Distance.
- Set the spatial reference to a local New York projection. Click the gear button to access the analysis settings. Select As specified for the Processing coordinate system and select the globe to browse for UTM Zone 18N. Zoom to the New York City region and run the analysis. The analysis is being run on the machines in your GeoAnalytics Server. When the analysis is complete, results will be added to your map. Results will be square polygons representing the count of taxi drop-off locations in each polygon as well as the additional statistics you calculated.