As an ArcGIS Enterprise deployment grows in complexity, additional considerations should be made when it comes to disaster recovery. These considerations require insight into the disparate systems that form the deployment architecture. As in many technical scenarios, there is no one-size-fits-all approach to backing up the core and dependent systems in a deployment.
The following provides a framework for increasing the success rate of restoring during a disaster recovery event. These practices can be adopted by organizations to define their standard operating procedures as part of a Business Continuity/Disaster Recovery (BC/DR) plan in case of a disaster in the context of their ArcGIS Enterprise deployment.
Backup
Review the following best practices for creating backups of your ArcGIS Enterprise organization and any referenced data sources.
Back up ArcGIS Enterprise
An ArcGIS Enterprise organization is composed of the Portal for ArcGIS site, all federated ArcGIS Server sites and their associated data, and the data contained within the ArcGIS Data Store. The components can be backed up using the included utility, webgisdr, or by using third-party tools for machine-based and image-based backups.
The webgisdr tool is a command line utility included with Portal for ArcGIS that is used to back up the organization's content and data, federated ArcGIS Server site information, and data contained within the relational and tile cache data stores. This tool is particularly useful for maintaining consistency in the components of a base deployment as well as in any additional federated sites, though it requires a functional deployment to perform the recovery.
The following should be considered outside the WebGIS DR backup process:
- Federated ArcGIS Mission Server or ArcGIS Notebook Server sites—If you have either of these, create backups by following the instructions in the ArcGIS Mission Server documentation and the ArcGIS Notebook Server documentation.
- Spatiotemporal big data store and graph store backups—If you have a spatiotemporal big data store or graph store (or both) registered with your hosting server, create backups of each using the ArcGIS Data Store backupdatastore utility.
- ArcGIS GeoEvent Server site configuration—Manage the backup of your ArcGIS GeoEvent Server configuration using the backup configuration file.
Object stores are used for ephemeral feature layer response caching at 11.0, so the data does not need to be persisted in the case of a recovery exercise.
Most virtualization platforms allow for snapshots to be taken of running virtual machines that allow for low recovery time objectives. While these are useful, they are not considered durable backups as part of a larger BC/DR plan.
When taking a backup before or during a maintenance window, the low recovery time objective supplied by snapshots serves as motivation to use those tools when available. When taking a third-party backup, the underlying data-tier components of both Portal for ArcGIS and ArcGIS Data Store do not have an integration with those methods and therefore involve a level of risk associated with taking a live backup of a running database. To minimize this risk, snapshots and image-based backups should be taken after stopping the service for the running ArcGIS Enterprise components.
In the case of architectures that use a file share to host the shared portal content directory or the configuration store and root directories for the ArcGIS Server sites, it is important to consider the consistency of backups of those locations when using third-party backup tools such as virtual machine snapshots or image-based backups. For example, if an administrator is rolling back following an unsuccessful Portal for ArcGIS upgrade by recovering a snapshot, the content directory may have been altered by the upgrade process and would no longer be consistent with the information contained within the database on the recovered instance. To minimize these effects when using third-party tools, the backups should be taken during an outage window when no content is being published or edited within the organization. This includes both the ArcGIS Enterprise components as well as any associated file share.
The ArcGIS Data Store can be backed up separately from the other components to minimize data loss in the event of a failure in that component. Running scheduled backups of relational and tile cache data stores can occur outside of the schedule for the webgisdr utility and other backup tools.
Back up referenced data sources
ArcGIS Server can serve content from many sources including enterprise geodatabases, registered file shares, and cloud stores. These external data sources should be included in the disaster recovery plan for a deployment. It is recommended that you follow vendor instructions on taking backups or replicating data to another location.
Enterprise geodatabases that contain data served by referenced services should be backed up according to the recovery point objectives of each organization by using the vendor-supplied tools. Since this data is referenced by ArcGIS Server services, the consistency of the published services can potentially get out of sync with the back-end database tables if the recovery of the database is performed independently of the sites that contain the published services. This makes it important to align the schedule of backups across all components in the ArcGIS Enterprise deployment.
Network file shares can use either image-based or file system-based backup tools to package the data, then transfer to a durable storage solution that exists outside of the failure domain of the deployment.
Cloud stores should be backed up or replicated to another region for additional recoverability and durability. The replicated stores can also be deployed using archive or cold storage to reduce overall cost.
When to back up
How often a backup is taken depends on several factors, the most important of which is how long the backup takes to complete. Since backup processes can impact system resource utilization, full backups are typically scheduled outside of major business hours. For different backup types, the frequency with which the system is backed up can vary across an ArcGIS Enterprise deployment.
For example, a production enterprise geodatabase may be backed up incrementally every 15 minutes for a low recovery point objective. The most important data should be stored within this database instance to reduce the amount of potential data loss. For an ArcGIS Enterprise deployment with many referenced services and static content, the frequency with which backups can be taken might be daily or weekly, while for deployments with heavy utilization of hosted feature services and frequent web map and application creation should target a shorter time between backups.
How long you keep backup files depends on the amount of free disk space you have and how much flexibility you require for recovery options. To learn more, see ArcGIS Enterprise backups.
Validate backups
Backups should be monitored for successful completion and alert administrators when a failure occurs. For the webgisdr tool, the exit code from running the script can be used as a gauge of whether a backup has completed successfully. A zero represents a successful backup while any non-zero code indicates a failure. There are several alerting tools that can be integrated to allow for email or SMS notifications to the team responsible for the backup integrity. Many third-party backup tools provide similar functionality or can be integrated with other services for providing alerts.
Another important aspect of validating an organization's BC/DR plan is to run a restore drill on a semiregular cadence. This helps administrators ensure that in the case of a disaster, they are prepared to restore from the functional backups and validate the restore plan described below.
Restore
Review the following best practices for restoring your ArcGIS Enterprise organization using the backups you have created.
What to restore
When an administrator has several backup types at their disposal, it is possible to restore components in a more granular fashion than reverting the entire deployment. If a map or image service's cache is deleted, only those files need to be recovered from a backup. Similarly, if a table is accidentally dropped from an enterprise geodatabase, that database can be recovered without affecting other components.
If bad edits are made to a hosted feature layer and the data needs to be rolled back, an administrator has the option to restore only the relational data store without restoring the entire ArcGIS Enterprise deployment. This reduces the impact the restore has on other data stored within the database, but if there were hosted services created during that time, it could cause the ArcGIS Serversite to become inconsistent with the restored database tables and require manual cleanup and republishing of the affected services.
Other times, there may be a significant outage such as a data center or cloud region that requires restoration of the entire ArcGIS Enterprise deployment as well as any external data sources. This would be the most extreme example and requires adequate planning to ensure complete functionality of the restored environment.
How to restore
When an ArcGIS Enterprise deployment experiences a widespread outage, there are multiple recovery options that depend on the types of backups available. Replication to a nearby site using the webgisdr utility is the most significant method to reduce the time to recover the deployment, while having a cold standby site available to spin up and restore can facilitate both recovery drills as well as reduce the overall time to recovery.
When deciding on the path to recovery, the option with the shortest recovery point and time objectives should be attempted first. This would allow the fastest feedback on the level of success of the restore. Having an administrator comfortable with the backup strategy who has tested restores regularly in the past can also shorten the time taken to recover in a disaster scenario.
Since ArcGIS Enterprise has multiple tiers across both internal and external components, the order in which those components are restored influences the stability of the deployment following a restore. All referenced data sources should be made available first and should be verified that they are accessible from the ArcGIS Enterprise environment, including database instances and external file shares, prior to restoring the ArcGIS Enterprise machines and components.
Once the surrounding dependencies are in place, the ArcGIS Enterprise deployment should be restored to a consistent state. This is to avoid scenarios in which the hosting server site may have a hosted feature service published but the relational data store is missing the dependent data table, or the organization may have an item for a service that is no longer present in one of the federated sites.
Post-restore validation
Once a restore operation is complete, validation should be performed for business-critical data and widespread functionality of the ArcGIS Enterprise deployment. This can be accomplished by creating checklists for business centers and departments to verify their most important content or automated through scripting. Approaching this validation by using automated scripts allows for greater confidence that the restore was successful in less time than a manual verification of items and services.