Organizations often require a certain level of system uptime for their ArcGIS Enterprise deployments, such as 99 percent of the time or higher. For these organizations, implementing a strategy to ensure high availability is crucial. This strategy should comprise both infrastructure elements and employee practices; neither can guarantee high availability alone.
The infrastructure component of a high-availability strategy involves maintaining at least two active copies of your deployment, and implementing failover mechanisms to automatically switch from primary to standby as soon as possible after machine failure. The standby deployment continually receives the same content and settings updates as the primary; this distinguishes highly available systems from replicated systems, which rely on regular backups to minimize data loss and do not automatically fail over. All mission-critical or business-critical elements of a deployment should be addressed when implementing high availability.
The human component of a high-availability strategy consists of organizational practices that ensure failover will always be successful and efficient. For example, machine maintenance or system updates should never be applied to both the primary and standby deployments in a highly available system, and a system administrator should always be available to take responsibility in the event of a failure.
The topics in this section explain how to configure and maintain a highly available ArcGIS Enterprise deployment.
When high availability should be used
A highly available ArcGIS Enterprise deployment is complex and requires time, effort, and cost to configure and maintain. It's important to determine whether high availability is required for your organization. Organizations considering high availability should ask questions such as the following:
- Does your organization have a mandated service-level agreement?
- What percentage of uptime is required by the service-level agreement?
- How many minutes or hours of downtime are permitted per year?
- How is the service-level agreement enforced?
- Does your organization have a contractual mandate for high availability?
- What are the terms of that mandate?
- Will this ArcGIS Enterprise deployment be involved in mission-critical or business-critical operations?
- Does your organization have the proper licensing from Esri to implement a highly available deployment?
- Is your organization able to provide the hardware necessary to support a highly available deployment?
- Do you have the hardware resources to duplicate each component of your deployment?
- Are you able to configure and maintain a third-party load balancer capable of performing failover?
Configure a highly available ArcGIS Server site
The below sections describe how each component of ArcGIS Enterprise is deployed in a high-availability environment.
ArcGIS Enterprise deployments contain a hosting server. This is an ArcGIS GIS Server you have dedicated to running your portal's hosted services, such as hosted feature, tile, and scene layers.
You might choose to federate additional ArcGIS GIS Server sites with your portal to allow them to share authentication and to automatically register web services as items in your portal. You might also federate ArcGIS GeoEvent Server or ArcGIS Image Server sites with your portal.
Any or all of these additional sites, as well as your hosting server site, can be configured as highly available. How you implement a highly available server site depends on whether your web services reside on a single machine or are spread across multiple machines.
If you have multiple machines in your site, configure a load balancer to communicate with your pool of ArcGIS Server machines. This pool of machines shares server directories and a configuration store. To prevent downtime in the event of machine failure, configure these directories on a highly available file server. You should also configure your load balancer to regularly perform a health check of each server machine.
See the following topics in the ArcGIS Server help for information on configuring a highly available single-machine or multiple-machine server site:
Single-machine high-availability (active-passive) deployment
Multiple-machine deployment with ArcGIS Web Adaptor
When you federate a highly available ArcGIS Server site with your ArcGIS Enterprise portal, set Administration URL to a URL that the portal can use to communicate with all servers in the site. This applies even when a URL is unavailable, such as a load balancer URL.
Also be aware that using a load balancer URL affects the way you connect to ArcGIS Server Manager. For example, if you federate using a load balancer URL, you must connect to Server Manager using the load balancer; you cannot use the default Server Manager URL of https://gisserver.domain.com:6443/arcgis/manager.
Important concepts in high availability
The following sections define and discuss key terms used in highly available systems.
Load balancer
Load balancers act as a reverse proxy and distribute traffic to back-end servers. At least one third-party load balancer is required in a highly available ArcGIS Enterprise deployment to improve the capacity and reliability of the software. They handle client traffic to your portal and server sites, as well as internal traffic between the software components.
Though ArcGIS Web Adaptor is considered a load balancer, it’s inadequate to serve as the lone load balancer in a highly available deployment. You can configure ArcGIS Web Adaptor instances with each server site for an added layer of security and anonymity, or to set up web-tier authentication. In these cases, the third-party load balancer sends traffic through the Web Adaptor rather than directly to server machines.
Load balancers need to be able to send HTTP health checks to the server health check or portal health check endpoints. A load balancer creates and manages the URLs used for the deployment, which are described in the next section.
URLs used in federation
Several different URLs are used in a highly available ArcGIS Enterprise deployment.
Services URL
This is the URL used by external users and client applications to access ArcGIS Server sites. It’s the URL for the load balancer that handles ArcGIS Server traffic and passes requests either to the server site’s Web Adaptor or directly to the server machines.
Administrative URL
This URL is used by administrators, and internally by the portal, to access an ArcGIS Server site when performing administrative operations. This must direct to a load balancer; if the administrative URL points to a single machine in the server site and that machine is offline, federation will not work. Depending on the architecture of your system, this can be the same URL as the services URL, or it can be a second load balancer.
Private portal URL
This is an internal URL used by your server sites to communicate with the portal. This must also direct to a load balancer and should be defined prior to federating. If you federate your server sites prior to setting the privatePortalURL, follow steps 8 and 9 in Configure an existing deployment for high availability to update the URL within your deployment. Similar to the administrative URL, this can be the same as the public URL for the portal, or it can be a second load balancer.
Monitoring
Each ArcGIS Enterprise component provides the ability to handle machine-level failures within a deployment. In a highly available component, when one machine goes offline, the other machine will continue to function with little to no disruption. However, the deployment now has a single point of failure and is at risk. It’s important that the deployment and individual machines be monitored to quickly detect failures and notify administrators when one or more machines go offline. This can be achieved using ArcGIS Monitor or third-party monitoring software.
People and practices
To create and maintain a highly available deployment, your organization needs to make sure people and practices are also highly available. If you only have one administrator and that administrator is not available during an outage, then that is not a highly available environment.
Equally important are your organizational practices. If you are using virtual machines, you shouldn’t put all components of a single software tier within a single host. For example, two virtual machines running a highly available portal shouldn’t be in the same virtual machine host, as that host is a single point of failure.
An organization should also make sure that there is always at least one component running at each software tier to maintain high availability. If you need to stop or restart a component, make sure that the other machine running the same component is accessible and functioning correctly.
You should never schedule simultaneous backups or maintenance for all machines in a highly available component. If the patch or backup causes all machines to fail, you have no machines left to take responsibility. See Apply patches and updates to highly available components for more guidance.
Storage for configuration files and data
One of the challenges facing customers deploying ArcGIS Enterprise on-premises is acquiring and maintaining a highly available storage device. Since ArcGIS Server and Portal for ArcGIS both require shared storage to set up high availability, the shared storage can be a single point of failure. In an on-premises deployment, use a NAS device or RAID to ensure that the storage of data and configuration files for ArcGIS Server and Portal for ArcGIS is highly available.
Cloud deployments offer the option of storing data and configuration files in a location that’s already highly available: Amazon Simple Storage Service (S3) buckets within Amazon Web Services (AWS) or BLOB containers in Microsoft Azure. These storage locations and availability are managed by the cloud provider. Visit the documentation for each respective cloud provider for more information.
Colocate components
Place all components and storage locations in a highly available ArcGIS Enterprise deployment in the same data center or cloud region to provide low-latency connectivity between each component. Do not split the primary and standby machines in a highly available deployment across separate data centers.
To safeguard against loss of a single data center, you can create a secondary deployment in a separate data center or cloud region. See Disaster recovery and replication for more information.
Deployment processes for high availability
Each component of ArcGIS Enterprise is deployed differently. The following sections explain high availability for each component and link to instructions to configure high availability for the ArcGIS components of an ArcGIS Enterprise deployment.
Configure highly available ArcGIS Server sites
ArcGIS Enterprise deployments contain a hosting server. This is an ArcGIS GIS Server site you have dedicated to running your portal's hosted services, such as hosted feature, tile, and scene layers.
You might choose to federate additional ArcGIS GIS Server sites with your portal to allow them to share authentication and to automatically register web services as items in your portal. Or you might federate ArcGIS GeoEvent Server or ArcGIS Image Server sites with your portal.
Each of these server sites can be configured as highly available. How you implement a highly available ArcGIS Server site depends on whether your web services reside on a single machine or are spread across multiple machines.
If you have multiple machines in your site, configure a load balancer to communicate with your pool of ArcGIS Server machines. This pool of machines shares server directories and a configuration store. You should configure these directories on a highly available file server to ensure uptime. You should also configure the load balancer to perform regular health checks of each server machine.
When you federate a highly available ArcGIS Server site with Portal for ArcGIS, set Administration URL to a URL that the portal can use to communicate with all servers in the site—even when one of them is unavailable, such as a load balancer URL.
Using a load balancer URL affects the way you connect to ArcGIS Server Manager. For example, if you federate using a load balancer URL, you must connect to Server Manager through the load balancer; you cannot use the default Server Manager URL of https://gisserver.domain.com:6443/arcgis/manager.
See the following topics in the ArcGIS Server help for information on configuring a highly available single-machine or multiple-machine ArcGIS Server site:
Single-machine high-availability (active-passive) deployment
Configure highly available data stores
Hosted web layers in an ArcGIS Enterprise portal access data in different ArcGIS Data Store types. You can configure each type to be highly available.
For more information and instructions for configuring a highly available ArcGIS Data Store, see Add a machine to your data store.
Hosted feature layer data
To have highly available hosted feature layer data, install ArcGIS Data Store and configure a primary and standby relational data store. Once you add a standby data store, the standby will become active if any of the following occurs:
- The primary data store stops working. ArcGIS Data Store attempts to restart the data store on the primary machine. If it cannot restart, the data store fails over to the standby.
- The primary's web app stops running and attempts to restart the web app on the primary machine. In the rare case that this does not work, the data store fails over to the standby machine.
- The primary machine is unavailable. This can happen if the computer crashes, is unplugged, or loses network connectivity. ArcGIS Data Store makes five attempts to connect to the primary machine. If a connection is not possible after five attempts, the data store fails over to the standby machine.
Install ArcGIS Data Store on two separate machines and create a relational data store on each machine. Configure each relational data store with the ArcGIS GIS Server site you will use as your portal's hosting server. The first relational data store you configure is the primary relational data store machine; the second machine you configure is the standby data store.
ArcGIS Data Store automatically replicates hosted feature layer data from the primary data store to the standby; therefore, the data exists in two places. The ArcGIS GIS Server site always communicates with the active (primary) data store.
Scene layer caches
To have highly available scene layer caches, install ArcGIS Data Store on three or more machines, and create tile cache data stores on each. Always create an odd number of tile cache data store machines. Configure each tile cache data store machine with your portal's hosting server.
You must configure all machines in the tile cache data store before portal members begin publishing hosted scene layers. When people publish, the hosted scene layer data is duplicated on two of the tile cache data store machines.
Observation and location tracking data
For highly available archived observation data used with ArcGIS GeoEvent Server, ArcGIS Tracker, or ArcGIS Mission or to make the data generated from big data feature analysis highly available, install ArcGIS Data Store on three or more machines and create a spatiotemporal big data store on each. Configure each data store with your portal's hosting server. A copy of each dataset exists on at least two of the data store machines at any time. If one machine fails, the data store ensures that at least two of the remaining machines contain the data.
Configure highly available source data
You publish data to ArcGIS Server sites from a variety of sources. If you register folders or databases with the ArcGIS Server sites in your ArcGIS Enterprise deployment, you need to store that source data in a manner or location that meets your high-availability needs. For file sources in folders, store on a highly available file server. For databases, use the technology of your database management system to ensure high availability.
Configure a highly available portal
A highly available portal includes two portal machines accessed through a load balancer.
The two portal machines store content in a common directory. For your portal to be highly available, you must configure this content directory on a highly available file server.
Once you configure a highly available portal, the primary portal replicates items to the standby portal. If the primary machine becomes unavailable, the standby is promoted to primary with all of the current items.
If you stop the Portal for ArcGIS service or the primary machine becomes unavailable (for example, if the hard drive fails), the portal will fail over to the standby. Once the machine returns from the failure or you restart the Portal for ArcGIS service, that machine will rejoin the portal as the standby machine.
You should configure the load balancer for the WebContextURL and the privatePortalURL to check the health of the portal machines.
See Configure a highly available portal for more information and instructions.