Strategies for data transfer to Amazon Web Services—ArcGIS Enterprise in the cloud

Deploying ArcGIS Server or ArcGIS Enterprise on Amazon Web Services (AWS) allows you to take advantage of the convenience and scalability of the cloud environment to host your web services. End users of web services typically do not need or want to log on to instances in AWS to get their work done; they'll still use applications on their local devices to work with data through services.

As an administrator of an ArcGIS deployment on AWS, you need to be sure publishers can create the services end users need. Some services can be published from local or web clients, copying data at the time of publication. In some cases, though, you may need to transfer GIS data over the Internet to locations in the cloud. This topic lists some options for publishing and copying data, how to transfer data to AWS when required, and where you can store data on AWS. It also discusses some factors that affect data transfer time.

Take advantage of web interfaces

ArcGIS Server Manager and the ArcGIS Enterprise portal are both accessed through a web browser. That means you can sign in to these applications from your local desktop without having to log on to the Amazon Elastic Compute Cloud (EC2) instances on AWS.

You can create a service definition file in your local ArcMap installation that includes the data you want in your service. Once you have the file, sign in to your stand-alone or federated ArcGIS Server site on AWS through Manager and publish from the service definition file.

If you deploy ArcGIS Enterprise on AWS, you can sign in to the portal as a user with privileges to create content and publish hosted feature layers, upload data sources such as zipped shapefiles, zipped file geodatabases, or comma separated values (CSV) files to the portal and publish hosted feature layers that you can share with other members of your portal organization.

If you've configured ArcGIS GeoEvent Server on your EC2 instance, you can stream live data feeds. See the ArcGIS GeoEvent Server help for more information.

Replicate data through a geodata service

You can connect to an ArcGIS Server site on AWS from an installation of ArcMap on your on-premises machine and register both your local enterprise geodatabase and an enterprise geodatabase on AWS, publish a geodata service of your geodatabase on AWS, and replicate data from your local geodatabase to the geodatabase on AWS through this service.

See Suggestions for configuring geodata services in the ArcGIS Server help for more information.

Move data to AWS

In some cases, you may need to move data to AWS, have publishers log on to one of the AWS instances you created from an Esri Amazon Machine Image (AMI) that includes a licensed copy of ArcGIS Pro, and have the publishers create maps and publish data there. You would need to do this in the following scenarios:

You store your source data on AWS.
You move a subset of source data to AWS to publish, as publishing data from on-premises sources to an ArcGIS Server site in the cloud can be slow and in many cases is not advised.

Places to store the data on AWS

There are several places you can store GIS data if you need to transfer the data to AWS. All the following options incur charges from Amazon that are subject to change and that you should research before making your choice. Store your data in the same region in AWS as your ArcGIS Server site.

Amazon Elastic Block Store (EBS) volumes—EBS volumes are virtual disk drives that you can attach to your EC2 instance to add more storage. The instances you launch from Esri AMIs contain a root volume. You can add your own, prepopulated EBS volume using the AWS Management Console.
Read the EBS overview in AWS documentation.
Amazon Simple Storage Service (S3)—Amazon S3 is an Amazon service designed specifically for data storage in the cloud. This storage option has the lowest potential for data failure or loss. You can use S3 as a place for data backup, as a middle ground for data transfer between your on-premises deployment and your EBS volumes, or as the location for map and image caches or big data file shares you register with an ArcGIS Server site on AWS.
Read the S3 overview in AWS documentation.
EC2 instance—It's possible to transfer data directly onto the root volume of your EC2 instance.

Options for transferring data to the cloud

Transferring data from your on-premises deployment into the cloud takes time and, in some cases, coordination with your information technology (IT) security staff. Exporting data to a location in the cloud is often not as fast or secure as the common data transfers that you do within your local network.

There are many strategies you can use to get data onto the cloud, but if you work with sensitive data, coordinate with your IT staff to make sure your method is secure and approved by your organization. The following are some of your options:

Copy the data when you publish a service.
When you publish a service, you can copy the data for that service to the ArcGIS Server site. The data is packaged into a service definition file (.sd), transferred into the ArcGIS Server site's uploads directory, and finally unpacked into the ArcGIS Server input directory. Be aware that this can take a long time and result in the transfer of large amounts of data if you do not limit the extents and datasets used in your map or other resource.
This option does not allow data to be shared between services, nor does it allow data synchronization between the cloud and your on-premises deployment.
Create a geodatabase on AWS and register it as the managed database for a stand-alone or federated ArcGIS Server site.
When you publish feature services to the ArcGIS Server site, data is copied to the managed database.
As with the previous option, this option does not allow data to be shared between services, nor does it allow data synchronization between the cloud and your on-premises deployment.
Use a Remote Desktop Connection and copy and paste data.
Microsoft Windows Remote Desktop Connection allows file system redirection wherein your local drives can be mapped to the remote computer. While logged in to your EC2 instance on Windows through Remote Desktop, you can open Windows Explorer and copy data from your local drives to your EBS volumes.
If you choose to transfer sensitive data using Remote Desktop Connection, you should ensure that additional layers of security are in place. Older versions of Remote Desktop Connection have been shown to contain security vulnerabilities that allows a computer posing as the server to gain access to your data (sometimes known as man-in-the-middle attacks).
Note:
Copy and paste can take a while to transfer data. Do not copy any other file or data before the paste procedure is complete. If you do, the paste terminates, and you have to start over.
Use S3 client utilities.
Amazon S3 can be used as a middle ground for moving data from your on-premises deployment to your EBS volumes. To get data into S3, you can use the AWS Management Console or a third-party app designed to move files between S3 and your own computers. Once your data is on S3, you can use the same utility on your EC2 instance to transfer data from S3 onto the instance.
Access data served from your own web server.
Any data available on the web through HTTP is accessible to your EC2 instance. If you have a web-facing server in your organization, you can place your data on it and download the data from your EC2 instance. The advantage of this approach is that you can configure security on your web server to limit who can download the data and to encrypt the transaction through SSL.
Enable FTP.
You can enable file transfer protocol (FTP) to upload files directly onto your EC2 instance. Be aware that standard FTP does not encrypt information and sends passwords in clear text. To safely use FTP, you need to take additional security measures, such as encrypting your FTP sessions with SSL, limiting which users are allowed to transfer data to your instance through FTP, and disabling FTP after your initial data transfer. Some third-party products are designed to help you set up secure FTP connections.
Use AWS tools.
If you need to transfer an enormous amount of data to Amazon, it may be faster or more cost-effective to ship the data to Amazon on a portable storage device and pay Amazon to load the data directly into S3. Amazon offers this service as AWS Snowball.

Amazon works with many solution providers, some of whom provide data transfer, storage, and security solutions. See the AWS Partner Solutions Finder to understand whether one of these companies can help with your cloud strategy. Esri is one of these providers and offers various project and implementation services for deploying ArcGIS in the Amazon cloud.

Factors that affect data transfer time

Performance of the above data transfer options can vary based on your physical proximity to the AWS region, the time of day, and the quality of your connection to the Internet.

GIS datasets, especially imagery and map caches, can take large amounts of space and may need to be zipped before transfer, either to reduce the size of the file or to reduce the total number of files for more efficient transfer (especially in the case of map caches). Some S3 client utilities may place limits on the size of any one file you can transfer or the number of individual files you can store. Also, some zipping programs have limits on the amount of data that can be zipped. The zipping time and effort should be taken into account when you choose a data transfer option.

Finally, if using S3, be aware of the limitations on the number of buckets you can create and other restrictions on S3 buckets. Amazon lists these in Bucket Restrictions and Limitations.

Maintaining the integrity of data paths

Anytime you move data to a new location, you need to be aware of any paths referencing the data that may also need to be updated. This is a concern with map documents, which may reference dozens of data layers at different paths.

Registering your Amazon Elastic Compute Cloud data location with your ArcGIS Server site can help reduce the effort of fixing broken data paths after publishing. See Register your data with ArcGIS Server using Server Manager in the ArcGIS Server help for more information.

Another way to reduce the need to repair data connections is to use relative paths in your map documents and store your maps and data in a common folder.

Feedback on this topic?