Strategies for data transfer to Amazon Web Services—ArcGIS Enterprise in the cloud

Deploying ArcGIS Enterprise on Amazon Web Services (AWS) allows you to take advantage of the convenience and scalability of the cloud environment to host your web services. End users can use these web services in applications on their local devices or in other applications in the cloud.

As an administrator of an ArcGIS deployment on AWS, you must ensure publishers can create the services end users need. Some services can be published from local or web clients, copying data at the time of publication. In most cases, though, you need to transfer GIS data over the internet to locations in the cloud. This page lists options for publishing and copying data, how to transfer data to AWS when required, and where you can store data on AWS. It also discusses factors that affect data transfer time.

Take advantage of web interfaces

ArcGIS Server Manager and the ArcGIS Enterprise portal are both accessed through a web browser. That means you can sign in to these applications from your local desktop without having to log on to the Amazon Elastic Compute Cloud (EC2) instances on AWS.

You can create a service definition file in a local ArcGIS Pro installation that includes the data you want in your service. Once you have the file, sign in to a stand-alone or federated ArcGIS Server site on AWS through Server Manager and publish from the service definition file.

If you deploy ArcGIS Enterprise on AWS, you can sign in to the portal as a user with privileges to create content and publish hosted feature layers, upload data sources such as zipped shapefiles, zipped file geodatabases, or comma-separated values (CSV) files to the portal and publish hosted feature layers that you can share with other members of your organization.

If you configured ArcGIS GeoEvent Server on your EC2 instance, you can stream live data feeds. See the ArcGIS GeoEvent Server help for more information.

Replicate data through a geodata service

You can connect to an ArcGIS Server site on AWS from an installation of ArcGIS Pro on your on-premises machine and register both your local enterprise geodatabase and an enterprise geodatabase on AWS, publish a geodata service of the geodatabase on AWS, and use the geodata service to replicate data from the local geodatabase to the geodatabase on AWS.

Move data to AWS

In some cases, you may need to move data to AWS, have publishers log on to one of the AWS instances you created that includes a licensed copy of ArcGIS Pro, and have the publishers create maps and publish data there. You must do this in the following scenarios:

You store your source data on AWS.
You move a subset of source data to AWS to publish, as publishing data from on-premises sources to an ArcGIS Server site in the cloud can be slow and in many cases is not advised.

Places to store the data on AWS

There are several places you can store GIS data on AWS. All the following options incur charges from Amazon that are subject to change and that you should research before making your choice. Store your data in the same region in AWS as your ArcGIS Server site and ArcGIS Pro installation.

Amazon Elastic Block Store (EBS) volumes—EBS volumes are virtual disk drives that you can attach to your EC2 instance to add more storage. The instances you launch using ArcGIS Enterprise on Amazon Web Services deployment tools contain a root volume. You can add your own, prepopulated EBS volume using the AWS Management Console. Volumes can contain data source files such as file geodatabases, and you can store map and image caches here.
Read the EBS overview in AWS documentation.
Amazon Simple Storage Service (S3)—Amazon S3 is an Amazon service designed specifically for data storage in the cloud. This storage option has the lowest potential for data failure or loss. You can use S3 as a place for data backup, as a middle ground for data transfer between your on-premises deployment and your EBS volumes, or as the location for map and image caches or big data file shares you register with an ArcGIS Server site on AWS.
Read the S3 overview in AWS documentation.
EC2 instance—You can transfer data directly onto the root volume of your EC2 instance.
Database services—When you use ArcGIS Enterprise on Amazon Web Services deployment tools, you can include an enterprise geodatabase stored in an Amazon Relational Database Service (RDS). You can load data into or create content in these geodatabases for use as source data for your web services. See Geodatabases on Amazon Web Services for more information.
ArcGIS Data Store—One of the components of an ArcGIS Enterprise deployment is ArcGIS Data Store, which holds data used by different hosted feature layers. When you deploy ArcGIS Enterprise on AWS and publish hosted web layers, data can be copied to one of the data stores created through ArcGIS Data Store. See What is ArcGIS Data Store for more information.

Options for transferring data to the cloud

Transferring data from your on-premises deployment into the cloud takes time and, in some cases, coordination with your information technology (IT) security staff. Exporting data to a location in the cloud is often not as fast or secure as the common data transfers that you do within your local network.

There are many strategies you can use to get data onto the cloud, but if you work with sensitive data, coordinate with your IT staff to make sure your method is secure and approved by your organization. The following are some options:

Copy the data when you publish a service.
When you publish a service, you can copy the data for that service to the ArcGIS Server site or ArcGIS Data Store. Depending on the type of service you published, the data is packaged into a service definition file (.sd), transferred into the ArcGIS Server site's uploads directory, and finally unpacked into the ArcGIS Server input directory or the data is copied into one of the types of ArcGIS Data Store you have in your deployment. Be aware that this can take a long time and result in the transfer of large amounts of data if you do not limit the extents and datasets used in your map or other resource.
This option does not allow data to be shared between services, nor does it allow data synchronization between the cloud and your on-premises deployment.
Use a Remote Desktop Connection and copy and paste data.
Microsoft Windows Remote Desktop Connection allows file system redirection wherein your local drives can be mapped to the remote computer. While logged in to your EC2 instance on Windows through Remote Desktop, you can open Windows Explorer and copy data from your local drives to your EBS volumes.
If you choose to transfer sensitive data using Remote Desktop Connection, ensure that additional layers of security are in place. Older versions of Remote Desktop Connection have been shown to contain security vulnerabilities that allow a computer posing as the server to gain access to your data (sometimes known as man-in-the-middle attacks).
Note:
Copy and paste can take a while to transfer data. Do not copy any other file or data before the paste procedure is complete. If you do, the paste terminates, and you have to start over.
Use S3 client utilities.
Amazon S3 can be used as a middle ground for moving data from your on-premises deployment to your EBS volumes. To get data into S3, you can use the AWS Management Console or a third-party app designed to move files between S3 and your own computers. Once your data is on S3, you can use the same utility on your EC2 instance to transfer data from S3 onto the instance.
Access data served from your own web server.
Any data available on the web through HTTP is accessible to your EC2 instance. If you have a web-facing server in your organization, you can place your data on it and download the data from your EC2 instance. The advantage of this approach is that you can configure security on your web server to limit who can download the data and to encrypt the transaction through SSL.
Enable FTP.
You can enable file transfer protocol (FTP) to upload files directly onto your EC2 instance. Be aware that standard FTP does not encrypt information and sends passwords in clear text. To safely use FTP, you need to take additional security measures, such as encrypting your FTP sessions with SSL, limiting which users are allowed to transfer data to your instance through FTP, and disabling FTP after your initial data transfer. Some third-party products are designed to help you set up secure FTP connections.
Use AWS tools.
If you need to transfer an enormous amount of data to AWS, it may be faster or more cost-effective to ship the data to Amazon on a portable storage device and pay Amazon to load the data directly into S3. Amazon offers this service as AWS Snowball.

Amazon works with many solution providers, some of whom provide data transfer, storage, and security solutions. See the AWS Partner Solutions Finder to determine whether one of these companies can help with your cloud strategy. Esri is one of these providers and offers various project and implementation services for deploying ArcGIS in the Amazon cloud.

Factors that affect data transfer time

Performance of the above data transfer options can vary based on your physical proximity to the AWS region, the time of day, and the quality of your connection to the internet.

When your client, data source, and ArcGIS Enterprise are not deployed in the same location, you will likely experience degraded performance when data is sent between an on-premises component and the cloud.

GIS datasets, especially imagery and map caches, can take large amounts of space and may need to be zipped before transfer, either to reduce the size of the file or to reduce the total number of files for more efficient transfer (especially in the case of map caches). Some S3 client utilities may place limits on the size of any one file you can transfer or the number of individual files you can store. Also, some zipping programs have limits on the amount of data that can be zipped. The zipping time and effort should be taken into account when you choose a data transfer option.

Finally, if using S3, be aware of the limitations on the number of buckets you can create and other restrictions on S3 buckets. Amazon lists these in Bucket restrictions and limitations.

Maintaining the integrity of data paths

Anytime you move data to a new location, you need to be aware of any paths referencing the data that may also need to be updated. This is a concern with map documents, which may reference dozens of data layers at different paths.

Registering your Amazon Elastic Compute Cloud data location with your ArcGIS Server site can help reduce the effort of fixing broken data paths after publishing. See Register your data with ArcGIS Server using Server Manager in the ArcGIS Server help for more information.

Another way to reduce the need to repair data connections is to use relative paths in your map documents and store your maps and data in a common folder.

Feedback on this topic?