Strategies for scalability, reliability, and resiliency—GeoEvent Server

Silo strategy

A silo strategy uses partitioning to accomplish workload separation with multiple ArcGIS GeoEvent Server machines. Based on the partitioning strategy using event stream loading, event historical dependency, and event delivery method, a constrained set of tasks is defined in the configuration for each silo. In a silo, no GeoEvent Server machines knows anything about the other machines and each machine processes the events it receives separately. Typically, the silo strategy is utilized in situations where reliability and/or scalability is the primary focus. In addition, a silo strategy by itself does not imply any sort of load balancing strategy (see Event routing strategy below).

Silo replication: active-passive

In an active-passive silo architecture, two or more identical machines are configured. One machine is identified as active while the other passive machine(s) are either powered off or in some way isolated. If the active machine fails, a passive machine is selected and activated as the new active system. The mechanism for administering failover is not in the scope of this document but can be accomplished manually or in an automated way, as is common in most virtual environments.

Silo replication: active-active

In an active-active silo architecture, a set of machines are configured identically and allowed to run in parallel. No one machine is identified as active or passive, and the entire set of machines operates concurrently. A strategy for failover is not required since all machines are running in parallel, however monitoring for machine failure is still required to ensure the pool of machines is operating as expected.

Of specific concern in an active-active environment is the duplication of data. If a set of N machines are configured identically, then it is expected that each event will be processed, in parallel, N times. This additional load can put strain on related systems such as databases and networks. A methodology for identifying records uniquely can be key in the deduplication of data once it arrives at its destination, however this does not reduce the overall load on the system.

Event routing strategy

An event routing strategy builds on top of a silo strategy by adding an additional balancer component to the system, typically a load balancer or routable message queue. By incorporating a load balancer in the architecture prior to a set of identically configured GeoEvent Server machines, events can be individually routed to the machines with the same services. In this approach, the machines are working as a group with the load balancer responsible for sending events to each machine. Typically, the event routing strategy is utilized in situations where resiliency or scalability is the primary objective. This strategy requires event data be pushed (versus pulled) into the site.

Note:

GeoEvent Server deployments that pull events cannot utilize the first two event routing strategies relying on an external load balancer or message queues. If a GeoEvent Server deployment is pulling (polling) data, see Silo replication using event partitioning below.

Load balancer

Using a load balancer allows a set of pushed events to be balanced across an array of identically configured and actively monitored GeoEvent Server machines. As events arrive, the load balancer assigns each event to a different machine in the cluster. Fundamentally, this site configuration consists of the load balancer on the front with a set of siloed GeoEvent Server machines in an active-active configuration behind it (with each machine configured identically).

Message queue cluster

Adding a Kafka Cluster to the architecture allows a set of pushed events to be balanced across an array of actively monitored GeoEvent Server machines, where each machine may have a different configuration. As events arrive, the Kafka Cluster assigns each event to a different message topic based on an event’s unique identifier(s). Each GeoEvent Server machine monitors one or more message topics and processes the events.

Fundamentally, this site configuration consists of the Kafka Cluster on the front and a set of siloed GeoEvent Server machines in an active-active configuration behind it. Where this configuration differs from the load balancer solution above is that each event can be routed to a specific GeoEvent Server configuration and/or machine. This has advantages when processing events with a historical dependency.

Silo replication using event partitioning

Event partitioning using silo replication is a special approach that can be used when scaling out the performance of a silo architecture. It is especially useful when the events are being pulled by GeoEvent Server, rather than being pushed.

Each siloed machine is configured identically with respect to processing workflow, but with each being assigned a specific subset of the events to process. Typically, this is accomplished by modifying each individual machine's configuration to request or accept a configured subset of the event data, usually based on the event data's Track ID or some other globally unique identifier. This scenario requires an event stream to be partitioned, for example, using a WHERE clause in a query request.

Feedback on this topic?