Skip To Content

Strategies for data transfer to Amazon Web Services

Deploying ArcGIS Server or ArcGIS Enterprise on Amazon Web Services (AWS) allows you to take advantage of the convenience and scalability of the cloud environment to host your web services. End users of web services typically will not need or want to log on to instances in AWS to get their work done; they'll still use applications on their local devices to work with data through services.

As an administrator of an ArcGIS deployment on AWS, you need to be sure publishers can create the services end users need. Some services can be published from local or web clients, copying data at the time of publication. In some cases, though, you may need to transfer GIS data over the Internet to locations in the cloud. This topic lists some options for publishing and copying data, how to transfer data to AWS when required, and where you can store data on AWS. It also discusses some factors that affect data transfer time.

Take advantage of web interfaces

ArcGIS Server Manager and the Portal for ArcGIS website are both accessed through a web browser. That means you can sign in to these applications from your local desktop without having to log on to the EC2 instances on AWS.

You can create a service definition file in your local ArcMap installation that includes the data you want in your service. Once you have the file, sign in to your stand-alone or federated ArcGIS Server site on AWS through Manager and publish from the service definition file.

If you deployed ArcGIS Enterprise on AWS, you can sign in to the portal website as a user with privileges to create content and publish hosted feature layers, upload data sources such as zipped shapefiles, zipped file geodatabases, or comma separated values (CSV) files to the portal and publish hosted feature layers that you can share with other members of your portal organization.

If you've configured ArcGIS GeoEvent Server on your EC2 instance, you can stream live data feeds. See the ArcGIS GeoEvent Server help for more information.

Replicate data through a geodata service

In ArcMap on your local machine, you can connect to your ArcGIS Server site on AWS and register both your local enterprise geodatabase and an enterprise geodatabase on AWS, publish a geodata service of your geodatabase on AWS, and replicate data from your local geodatabase to the geodatabase on AWS through this service.

See Replication to an Amazon EC2 instance using geodata services and Use a geodata service and a connected replica for more information.

Move data to AWS

In some cases, you may need to move data to AWS, have publishers log on to one of the AWS instances you created from an Esri AMI and configured ArcGIS Desktop on, and have the publishers create maps and publish data there. You would need to do this in the following scenarios:

  • You store your source data on AWS.
  • You move a subset of source data to AWS to publish, as publishing data from on-premises sources to an ArcGIS Server site in the cloud can be slow and in many cases is not advised.

Note:

When you restart AWS instances, machine names change, which can cause the ArcGIS Desktop license manager to stop working. Use of this licensing mechanism in the cloud should be avoided when possible.

Places to store the data on AWS

There are several places you can store GIS data if you need to transfer the data to AWS. All the following options incur charges from Amazon that are subject to change and that you should research before making your choice. Store your data in the same region in AWS as your ArcGIS Server site.

  • EBS volumes—Amazon Elastic Block Store (EBS) volumes are virtual disk drives that you can attach to your EC2 instance to add more storage. An EBS volume is always attached for you as part of the instances you launch from Esri Amazon Machine Images (AMIs). You can configure the size of this attached volume when you build the site. The ArcGIS Server directories are configured on this drive when you use CloudFormation or ArcGIS Server Cloud Builder on Amazon Web Services, so when you publish services with the option to copy data to the ArcGIS Server site, the data goes onto this EBS volume. You can also create other directories on this volume to hold your data.

    Read Amazon's EBS overview

  • Amazon S3—Amazon Simple Storage Service (S3) is an Amazon service designed specifically for data storage in the cloud. This storage option has the lowest potential for data failure or loss. You can use S3 as a place for data backup, as a middle ground for data transfer between your on-premises deployment and your EBS volumes, or as the location of file-based data you register with an ArcGIS Server site on AWS.

    Read Amazon's S3 overview

  • EC2 instance—It's possible to transfer data directly onto your EC2 instance; however, if the instance is terminated, your data from the C: drive on Windows or root drive on Linux will be immediately lost. Instances created from the Esri AMIs apportion a relatively small amount of space on the C: drive to discourage data storage on this drive. In contrast, attached EBS volumes, such as the D: drive on Windows instances, persist when the instance terminates and are a safer option for data storage.
    Caution:

    Do not store GIS data or map caches on the C: or root drive of your EC2 instance in a production deployment.

Options for transferring data to the cloud

Transferring data from your on-premises deployment into the cloud takes time and, in some cases, coordination with your IT security staff. Exporting data to a location on the Internet (in other words, the cloud) is often not as fast or secure as the common data transfers that you do within your local network.

There are many strategies you can use to get data onto the cloud, but if you work with sensitive data, you'll want to make sure you coordinate with your IT staff to make sure your method is secure and approved by your organization. The following are some of your options:

  • Copy the data when you publish a service—When you publish a service, you can copy the data for that service to the ArcGIS Server site. The data is packaged into a service definition file (.sd), transferred into the ArcGIS Server site's uploads directory, and finally unpacked into the ArcGIS Server input directory or, when you publish feature services, the data is placed in the GIS Server site's managed database. Be aware that this can take a long time and result in the transfer of large amounts of data if you do not limit the extents and datasets used in your map or other resource.

    This option does not allow data to be shared between services, nor does it allow data synchronization between the cloud and your on-premises deployment.

  • Remote Desktop Connection copy and paste—Windows Remote Desktop Connection allows file system redirection wherein your local drives can be mapped to the remote computer. While logged in to your EC2 instance on Windows through Remote Desktop, you can open Windows Explorer and copy data from your local drives to your EBS volumes.

    To enable file system redirection, click the Local Resources tab in the Remote Desktop Connection window and check the check box to make your drives available. The wording varies depending on which version of Windows you are using.

    If you choose to transfer sensitive data using Remote Desktop Connection, you should ensure that additional layers of security are in place. Older versions of Remote Desktop Connection have been shown to contain security vulnerabilities wherein a computer posing as the server can gain access to your data (sometimes known as man-in-the-middle attacks).

    Note:

    Copy and paste can take a while to transfer data. Do not copy any other file or data before the paste procedure is complete. If you do, the paste terminates, and you have to start over.

  • S3 client utilities—Amazon S3 can be used as a middle ground for moving data from your on-premises deployment to your EBS volumes. To get data into S3, you can use the AWS Management Console or one of the many third-party apps that are designed for easily moving files between S3 and your own computers. Once your data is on S3, you can use the same utility on your EC2 instance to transfer data from S3 onto the instance. Alternatively, you can load file-based data to S3 and register the S3 bucket with your ArcGIS Server site on AWS.
  • Your own web server—Any data available on the web through HTTP is accessible to your EC2 instance. If you have a web-facing server in your organization, you can place your data on it and download the data from your EC2 instance. The advantage of this approach is that you can configure security on your web server to limit who can download the data and to encrypt the transaction through SSL.
  • FTP—You can enable file transfer protocol (FTP) to upload files directly onto your EC2 instance. Be aware that standard FTP does not encrypt information and sends passwords in clear text. To safely use FTP, you need to take additional security measures, such as encrypting your FTP sessions with SSL, limiting which users are allowed to transfer data to your instance through FTP, and disabling FTP after your initial data transfer. Some third-party products are designed to help you set up secure FTP connections.
  • AWS Import/Export—If you need to transfer an enormous amount of data to Amazon, it may be faster or more cost-effective to ship the data to Amazon on a portable storage device and pay Amazon to load the data directly into S3. Amazon offers this service as AWS Import/Export.

    If you consider using AWS Import/Export, you'll need to decide if it's appropriate for your organization's data sensitivity. Anytime you put a device in the mail, you run the risk, however small, of the physical destruction or interception of your data. You can mitigate these risks by backing up and encrypting the data. If you still have concerns about whether AWS Import/Export is an appropriate choice for your data, contact Amazon directly.

Amazon works with many solution providers, some of whom provide data transfer, storage, and security solutions. See Find an AWS Solution Provider in the AWS help to understand whether one of these companies can help with your cloud strategy. Esri is one of these providers and offers various project and implementation services for deploying ArcGIS in the Amazon cloud.

Factors that affect data transfer time

Performance of the above data transfer options can vary based on your physical proximity to the Amazon cloud, the time of day, and the quality of your connection to the Internet.

GIS datasets, especially imagery and map caches, can take large amounts of space and may need to be zipped before transfer, either to reduce the size of the file or to reduce the total number of files for more efficient transfer (especially in the case of map caches). Some S3 client utilities may place limits on the size of any one file you can transfer or the number of individual files you can store. Also, some zipping programs have limits on the amount of data that can be zipped. The zipping time and effort should be taken into account when you choose a data transfer option.

Finally, if using S3, be aware of the limitations on the number of buckets you can create and other restrictions on S3 buckets. Amazon lists these in Bucket Restrictions and Limitations.

Maintaining the integrity of data paths

Anytime you move data to a new location, you need to be aware of any paths referencing the data that may also need to be updated. This is a concern with map documents, which may reference dozens of data layers at different paths.

Registering your Amazon EC2 data location with your ArcGIS Server site can help reduce the effort of fixing broken data paths after publishing. See Register your data with ArcGIS Server using Manager in the ArcGIS Server help for more information.

Another way to reduce the need to repair data connections is to use relative paths in your map documents and store your maps and data in a common folder.