Skip To Content

Disaster recovery and replication

You can replicate your ArcGIS Enterprise deployment to a disconnected standby deployment. If your primary deployment fails or becomes inaccessible, you can fail over to the standby deployment.

Standby deployments typically run on a different network or subnetwork, or even in a geographically separate location from your primary deployment. Wherever you place the standby deployment, be sure your ArcGIS Enterprise clients can access it when it is needed.

Geographic redundancy

You can implement geographic redundancy if your primary data center and standby data center are in geographically separate locations. If one data center experiences a catastrophic event, such as a hurricane or other natural disaster, you can make the standby data center active and operations can resume.

Geographic redundancy has specific requirements to be successful.

  • The primary and standby environments must be duplicated. Each data center must have the same number of machines in the ArcGIS Enterprise deployment, and the URLs used to access the components must be the same.
  • Geographic redundancy typically follows an active-passive approach; therefore, data and content must be replicated to the standby ArcGIS Enterprise deployment consistently.
  • Geographic redundancy relies on third-party components to be successful. For example, a global site selector or global domain name system (DNS) server is important so that when a switch has to occur from the primary data center to the standby, there is no disruption to any ArcGIS Enterprise users.

To ensure the least amount of downtime in event of a failure or catastrophe, you could deploy a highly available, geographically redundant ArcGIS Enterprise. This is the most complex deployment to achieve, as it requires the most machines and the most maintenance. Configure two separate data centers, each with their own highly available ArcGIS Enterprise deployment. In each data center, all the machine names are configured identically and there are no single points of failure, which include the data, whether it resides in a highly available file server or highly available database, all web servers and load balancers, as well as the ArcGIS Enterprise components. Backups of the primary deployment are consistently created, and restoration to the standby deployment in the separate data center can occur immediately or when a failure in the primary deployment occurs.

Planning for a replicated deployment

First, determine how many machines you require. Next, plan for the following disaster recovery requirements for a replicated ArcGIS Enterprise deployment:

  • Duplication—Ensure both data centers and ArcGIS Enterprise deployments contain the same architecture.
  • Replication—Back up content and data from the primary data center and restore to the standby.
  • Monitoring—Review logs to determine when a failure occurs and determine whether the severity of the failure requires you to fail over to the standby data center.
  • Fail over—Decide whether to fail over to a different component within ArcGIS Enterprise or fail over the entire ArcGIS Enterprise deployment to a different data center.

Determine machine requirements

The number of machines you need depends on how you configure ArcGIS Enterprise. At a minimum, you need two machines. If your ArcGIS Enterprise deployment does not store a lot of data and services, does not include a spatiotemporal big data store, and not that many people access it, you can configure a primary deployment comprised of a single-machine GIS Server site, and install Portal for ArcGIS and ArcGIS Data Store on the same machine. You need a second machine to store the replicated standby deployment.

If your ArcGIS Enterprise deployment is more heavily used—for example, if a large number of people access it, your organization stores a large number of items, or your deployment is heavily edited—you may need a single or multimachine GIS Server site, and you should install Portal for ArcGIS and ArcGIS Data Store on machines separate from each other and separate from the GIS Server machines. If you publish multiple hosted scene layers, you may want to configure ArcGIS Data Store (tile cache data store) to store the scene cache databases on another machine. If you will be using a spatiotemporal big data store, you'll need at least one additional machine. In this case, calculate the number of machines required using the following formula:

(<number of GIS Server machines> + 1 Portal for ArcGIS machine + <number of machines in the data store>) X 2

Note that additional ArcGIS licenses are not required for the standby deployment because it is not actively accessed; you only make it the active deployment if the primary fails.

Also note that the webgisdr utility records the software versions of the ArcGIS Enterprise components when you create a backup file. The standby deployment to which you import the file must be at the same version as your primary deployment.

Duplicate deployments

Within ArcGIS Enterprise, there are various dependencies you must account for that typically revolve around accessibility. Map services rely on data in a shared folder or accessed through a database connection. Machines within ArcGIS Enterprise communicate with each other through specific URLs. For these reasons, an ArcGIS Enterprise deployment in one site must be duplicated in another so every component (for example, folder locations, database names, and URLs) within the deployment in each data center is the same. Network-attached storage (NAS) devices that store file geodatabases or Portal for ArcGIS and ArcGIS Server configuration files need to be named the same so the standby deployment can successfully connect to the resources. All the ArcGIS Enterprise components must be installed in the same directories within each deployment. Finally, the number of machines should be identical between the data centers, as performance issues can arise if less machines are available to respond to user load. Note that you can use DNS entries or modify hosts files on the machines to achieve host name consistency.

Replicate ArcGIS Enterprise

Portal for ArcGIS includes a tool—webgisdr—that allows you to export portal content, federated ArcGIS Server sites, and ArcGIS Data Store relational and tile cache data store content to a file that you can move to the standby machine to restore. The tool maintains Portal for ArcGIS, ArcGIS Server, and ArcGIS Data Store configured settings, and copies all content created in the portal as well as data that's copied to the hosting server and data store while publishing.

Note that the tool does not copy data from databases or folders registered with the hosting server or federated ArcGIS Server sites, for example, data in a database or file geodatabase data. It is up to the organization to replicate that data to the standby ArcGIS Enterprise deployment and ensure that services on the standby can access the replicated data.

When you register data sources with ArcGIS Server sites, you provide specific information on how to access the data. That information must be the same for the standby deployment as for the primary. For example, if you copy file geodatabases used for source data to the standby deployment, directory paths to the file geodatabases must be the same as on the primary deployment. Also, the standby deployment must be able to access a database using the same connection information you provided when you registered the database with the ArcGIS Server site on the primary deployment.

You can run the webgisdr tool as a scheduled task within Windows Task Scheduler or as a cron job within a Linux environment. Additionally, the tool can be moved to and run from a different machine than the portal installation as long as communication is open between the machine where the tool is run and the ArcGIS Enterprise components.

You should restore the ArcGIS Enterprise backups to the standby deployment as soon as they're exported from the primary deployment. This avoids restoring incremental backups in the wrong order, and means that minimal data loss or downtime will occur in the event that the primary deployment fails. If you do not immediately restore backups, there may be additional overhead in importing the backup and failing over to the standby deployment.

Also consider that if something is incorrect in the primary deployment when the backup is created and there are automated processes to import the backup to the standby, those incorrect settings will be imported to the standby deployment.

See Configure disaster recovery for instructions on replicating an ArcGIS Enterprise deployment.

Monitor ArcGIS Enterprise

Monitoring is important in both a replicated and highly available environment. In a highly available environment, certain parts of the deployment fail over without human intervention. For example, if the primary portal in ArcGIS Enterprise fails, the software immediately fails over to the standby without any human intervention. Similarly, ArcGIS Server and ArcGIS Data Store components can fail, and the system can function as normal as there are no single points of failure. Considering there may be no visible disruptions in ArcGIS Enterprise, you should put mechanisms in place to notify administrators of failures on any particular component within the ArcGIS Enterprise deployment. Use Python (or the scripting language of your choice) with the ArcGIS Server and Portal for ArcGIS REST API to automate monitoring parts of your deployment such as those listed here:

  • Query the Portal for ArcGIS and ArcGIS Server logs periodically to check for messages that indicate a failure of a particular component. If a failure occurs, script can be written to send emails or notify administrators that attention is needed.
  • Use the Health Check functions in the Portal for ArcGIS and ArcGIS Server Administration APIs to query logs and check for issues.
  • Validate all federated servers to ensure they are running and the portal can reach them.
  • Validate connections to all data stores. This includes connections to the relational, tile cache, and spatiotemporal big data stores, as well as to registered folders and databases, big data file shares, and the raster data store.
  • Periodically query important services and web maps to be sure they are functioning.
  • Query the Indexer Status on primary machine before replication to ensure everything in the portal is indexed on the primary. Query the Indexer Status on both machines after replication completes to confirm that index values match between the standby and primary deployments. Values for databaseCount and indexCount for each name should be the same both within and between the two deployments. For example, when you query the Indexer Status, it shows you how many items are in the database compared to how many items are indexed. What's in the database (databaseCount) should match what is indexed (indexCount) for each item type (name), as shown in the following example. If databaseCount and indexCount do not match for an item type, you need to reindex the portal. This output should be the same on both the primary and standby machines.
    {"indexes": [
        {
            "name": "users",
            "databaseCount": 42,
            "indexCount": 42
        },
        {
            "name": "groups",
            "databaseCount": 21,
            "indexCount": 21
        },
        {
            "name": "search",
            "databaseCount": 8499,
            "indexCount": 8499
        }
    ]}

In a replicated environment, failover requires human intervention; therefore, you must monitor your deployment to determine when failures occur so you can decide if a failover is necessary.

If you automate the replication of your deployment from primary to standby, you also need to monitor these processes to be sure backups, moving of files, and restore operations complete successfully.

Failover

Within ArcGIS Enterprise, Portal for ArcGIS, ArcGIS Server, and ArcGIS Data Store have their own internal mechanisms to fail over. In a highly available configuration, each component can fail over without significant disruption to the overall ArcGIS Enterprise.

Failover of a replicated deployment from the primary to the standby data center typically involves the organization's IT department and can be achieved through a global site selector (GSS) or global DNS. Members of an organization typically reach their ArcGIS Enterprise deployment through a few URLs, for example, https://myportalwa.organization.com/portal for the portal URL and https://myserverwa.organization.com/server for the ArcGIS Server services URL. The GSS or global GNS can assign an IP address to each hostname. If you need to fail over to a different data center, the GSS or global DNS will reassign the myportalwa.organization.com and myserverwa.organization.com hostnames to the IP addresses associated with the standby data center. Clients and users will not be affected, but all requests are sent to the standby data center. Once the primary data center is back online, the IP address for the primary site hosts can be reassigned to IP addresses within the original data center. You would then need to reconcile data from the standby to the primary to ensure the primary data center contains all of the new content and data that was created while the standby was active.

If data in any of the hosting server or federated ArcGIS Server site's registered databases (enterprise geodatabase or database) was edited, use database replication tools to ensure the original primary ArcGIS Enterprise deployment contains that updated data. If data in file-based data sources, such as file geodatabases, registered with any of the ArcGIS Server sites in the ArcGIS Enterprise deployment has changed, copy edited files to the original directory it was stored in. Finally, use the webgisdr utility to export an ArcGIS Enterprise backup from the standby and import it to the primary. The tool will replicate the content in the portal, including associated hosted feature and scene layer data and new nonhosted services registered with the portal, to the original primary ArcGIS Enterprise deployment.