Skip To Content

Kafka on-disk storage

ArcGIS GeoEvent Server uses Apache Kafka to manage all event traffic from inputs to GeoEvent Services and then again from GeoEvent Services to outputs. Kafka provides a set of topics (message queues) for events to be published to and for consumers to subscribe to those event messages. The Kafka topic queues are managed on disk for persistent storage and for message queue recovery after a system failure.

GeoEvent Server Kafka topic basics

Each GeoEvent Server input and output has its own Kafka topic.

Each input and output has a

Each Kafka topic is separated into several partitions. Partitions break the events into three separate message queues for parallelism. Each Kafka topic is configured, by default, to create three topic partitions. A subscriber of the topic will spin up several event consumers that run in parallel to improve performance.

Topic partitions in a

Each Kafka topic partition is replicated twice for resiliency and to improve parallelism. Replicas are distributed across three different folders on disk.

Topic partition replicas in a

Also note that Kafka creates and manages a large set of partitions for a consumer offset topic. This large number of partitions is what gives the system good performance via parallelism.

General disk size recommendations

For a new installation of GeoEvent Server, the ArcGIS GeoEvent Gateway service requires at least 1 GB of disk space. Each input or output you add will require a minimum of 720 MB of additional disk space before you process any events. Note that all the sizes are minimum estimates and are likely to grow the more elements you configure in GeoEvent Server.

GeoEvent Server Kafka settings

You can modify the behavior of the Kafka instance for GeoEvent Server by editing the Kafka properties file. The primary reason for modifying this property file is to change the location of the files on disk. However, on some rare occasions, the other properties may need to be updated.

Nota:

Before editing the property file below, you must stop the GeoEvent Server Windows services or Linux daemons depending on your operating system. Once the files are saved and closed, start the GeoEvent Server services, and the updated properties are used.

Kafka properties file

The property file that contains the Kafka settings (kafka.properties) for GeoEvent Server can be found in one of the following directories, depending on your operating system:

  • Windows (default)—C:\Program Files\ArcGIS\server\geoevent\gateway\etc\kafka.properties
  • Linux (default)—/home/arcgis/server/GeoEvent/gateway/etc/kafka.properties

The default settings in this file are set to optimize performance over increased disk usage.

Topic storage

The Kafka topics in GeoEvent Server are stored in one of the following directories, depending on your operating system:

  • Windows (default)—C:\ProgramData\ESRI\GeoEvent-Gateway\kafka\
  • Linux (default)—/home/arcgis/.esri/GeoEvent-Gateway/config.[machine name]/kafka/ (for example, /home/arcgis/.esri/GeoEvent-Gateway/config.gesdev01/kafka/)

In the kafka\ folder, there are three log folders where the partition replicas are stored: log\, log1\, and log2\.

To change the storage location of the Kafka topics, update the following properties, depending on your operating system:

  • Windows default properties:
    • gateway.data.dir=C://ProgramData//Esri//GeoEvent-Gateway//
    • log.dirs=kafka/logs,kafka/logs1,kafka/logs2

Linux default properties:

  • gateway.data.dir=/home/arcgis/.esri/GeoEvent-Gateway/config.[machine name] (for example, /home/arcgis/.esri/GeoEvent-Gateway/config.gesdev01)
  • log.dirs=kafka/logs,kafka/logs1,kafka/logs2

Topic partitions

In GeoEvent Server, the default number of topic partitions is three. If you inspect the log folders where your topics are stored, you will find three folders with identical names and an index at the end (-1, -2, and -3). Inside each partition folder, Kafka maintains a log of all the data in the topic partition at that moment. To change the number of topic partitions, modify the following property:

num.partitions=3

Topic replication

In GeoEvent Server, the default number of topic partition replicas is two. If you inspect each log folder where your topics are stored, you will find two folders with identical names and an index at the end (-1, -2, or -3). Each of the log\, log1\, and log2\ folders is responsible for storing one replica of two partition folders (which two partitions a folder gets is random). To change the number of topic partition replicas, modify the following property:

replication-factor=2

Topic partition file sizes

By default, each Kafka topic partition log file will start at a minimum size of 20 MB and grow to a maximum size of 100 MB on disk before a new log file is created. It’s possible to have multiple log files in a partition replica at any one time. At a minimum, plan for no less 720 MB [ (100 MB + 20 MB) x 3 partitions x 2 replicas = 720 MB ] per input/output. In extreme cases of high-velocity event streams, each topic partition replica folder can grow to be 3 to 4 times larger than the maximum log file size (up to 300 MB to 400 MB per partition replica). For a single topic with three partitions, the total disk space can grow to be 1800 MB to 2400 MB at any given time. Multiply that maximum size by the number of inputs and outputs you have configured and that is the estimated size on disk you need to have available for Kafka in GeoEvent Server. The following property controls the maximum size of the log file before rolling over to a new file (with the default being 100 MB):

log.segment.bytes=104857600

If you have high-velocity data, you may end up with multiple 100 MB log files; if not, you may only have one. For lower-velocity event data, the smaller the size you can set this property, the better. For higher-velocity event data, larger is better. If you set the size too small, Kafka will continually be rolling over files. If you set the size too large, Kafka will rarely roll over the log file to a new one, and old events will be kept in the queue longer than necessary.

Another setting that affects how much disk space is consumed by a Kafka topic partition is retention bytes. This property instructs Kafka to always keep a minimum amount of data. By default, the value for this property is 100 MB. So if Kafka determines that it can and should delete old data, the size of the remaining data will never be below 120 MB (100 MB for the old log file and 20 MB for the new log file). As with the segment bytes property above, if you are working with lower data velocities, you can lower the value for this property. When working with higher-velocity data, the default of 100 MB should be used. The following property controls the minimum amount of data that should be retained in the log file before old data is deleted (with the default being 100 MB):

log.retention.bytes=104857600

Topic partition file management

As subscribers consume events from a partition's queue, events become outdated when marked as consumed by all subscribers. The amount of time Kafka keeps old messages and the frequency that Kafka deletes old messages can be set using the following properties:

  • log.retention.hours=1
  • log.retention.check.interval.ms=30000

The default retention policy is 1 hour. Any data files older than 1 hour and not currently storing active data are deleted. If the file is still being actively used to store data (as may be the case with low-volume or low-velocity data), it is not deleted. Kafka reviews old data files every 30 seconds by default.

Topic partition file management properties

The following properties are also available in the Kafka properties file to control the rate at which the partition files are rolled over. Adjusting this property lower may improve disk space utilization, but it may also affect performance if set too low. Increasing this property increases disk space usage.

  • log.roll.ms=1800000
  • log.roll.jitter.ms=180000

The first property instructs Kafka to roll a data file over and replace it with a new one every 30 minutes. Kafka creates a new data file every 30 minutes regardless of the size of the old data file. For low-velocity data streams, this can prevent having to maintain older data if the data file is not filling up often. The second property determines how consistently Kafka rolls over the data file. The recommended value is 3 minutes, meaning Kafka checks whether to roll the data file over every 3 minutes.