Skip To Content

Kafka on-disk storage

ArcGIS GeoEvent Server utilizes Apache Kafka to manage all event traffic from inputs to GeoEvent Services and then again from a GeoEvent Services to outputs. Kafka provides a set of topics (message queues) for events to be published to and for consumers to subscribe to those event messages. The Kafka topic queues are managed on disk for persistent storage and for message queue recovery upon a system failure.

GeoEvent Server Kafka topic basics

Each GeoEvent Server input and output has its own Kafka topic.

Each input and output have a

Each Kafka topic is broken down into several partitions. Partitions break the events into three separate message queues for parallelism. Each Kafka topic is configured, by default, to create three topic partitions. A subscriber of the topic will spin up several event consumers that run in parallel to improve performance.

Topic partitions in a

Each Kafka topic partition is replicated twice for resiliency and to improve parallelism. Replicas are distributed across three different folders on disk.

Topic partition replicas in a

Also note that Kafka creates and manages a large set of partitions for a consumer offset topic. This large number of partitions is what gives the system such good performance via parallelism.

General disk size recommendations

For a new installation of GeoEvent Server, the ArcGIS GeoEvent Gateway service requires at least 1GB of disk space. Each input or output you add will require a minimum of 720 MB additional disk space before you process any events. Note that all the sizes are minimum estimates and are likely to grow the more elements you configure in GeoEvent Server.

GeoEvent Server Kafka settings

You can modify the behavior of the Kafka instance for GeoEvent Server by editing the Kafka properties file. The primary reason for modifying this property file is to change the location of the files on disk. However, there are rare occasions where the other properties may need to be updated.

The Kafka properties file

The property file that contains the Kafka settings (kafka.properties) for GeoEvent Server can be found in one of the following directories depending on your operating system.

  • Windows (default)—C:\Program Files\ArcGIS\server\geoevent\gateway\etc\kafka.properties
  • Linux (default)—/home/arcgis/server/GeoEvent/gateway/etc/kafka.properties

The default settings in this file are set to optimize performance at the expense of increased disk usage.

Topic storage

The Kafka topics in GeoEvent Server are stored in one of the following directories depending on your operating system.

  • Windows (default)—C:\ProgramData\ESRI\GeoEvent-Gateway\kafka\
  • Linux (default)—/home/arcgis/.esri/GeoEvent-Gateway/config.[machine name]/kafka/ (e.g. /home/arcgis/.esri/GeoEvent-Gateway/config.gesdev01/kafka/)

Within the kafka\ folder there will be three log folders where the partition replicas are stored: log\, log1\, log2\.

To change the storage location of the Kafka topics, update the following properties depending on your operating system.

Windows default properties:

  • gateway.data.dir=C://ProgramData//Esri//GeoEvent-Gateway//
  • log.dirs=kafka/logs,kafka/logs1,kafka/logs2

Linux default properties:

  • gateway.data.dir=/home/arcgis/.esri/GeoEvent-Gateway/config.[machine name] (e.g. /home/arcgis/.esri/GeoEvent-Gateway/config.gesdev01)
  • log.dirs=kafka/logs,kafka/logs1,kafka/logs2

Topic partitions

In GeoEvent Server, the default number of topic partitions is 3. Thus, if you inspect the log folders where your topics are stored, you will find folders with identical names and an index at the end (-1, -2, and -3). Inside each partition folder, Kafka maintains a log of all the data in the topic partition at that moment. To change the number of topic partitions, modify the following property.

  • num.partitions=3

Topic replication

In GeoEvent Server, the default number of topic partition replicas is 2. Thus, if you inspect each log folder where your topics are stored, you will always find two folders with identical names and an index at the end (-1, -2, or -3). Each of the log\, log1\, log2\ folders is responsible for storing one replica of two partition folders (which two partitions a folder gets is random). To change the number of topic partition replicas, modify the following property.

  • replication-factor=2

Topic partition file sizes

By default, each Kafka topic partition log file will start at a minimum size of 20 MB and grow to a maximum size of 100 MB on disk before a new log file is created. It’s possible to have multiple log files in a partition replica at any one time. At a minimum, plan for no less 720 MB [ (100 MB + 20 MB) x 3 partitions x 2 replicas = 720 MB ] per input/output. In extreme cases of high-velocity event streams, each topic partition replica folder can grow to be 3 to 4 times larger than the maximum log file size (up to 300 MB to 400 MB per partition replica). For a single topic with three partitions, the total disk space can grow to be 1800 MB to 2400 MB at any given time. Multiply that maximum size by the number of inputs and outputs you have configured and that is the estimated size on disk you need to have available for Kafka in GeoEvent Server. The property below controls the maximum size of the log file before rolling over to a new file (with the default being 100 MB).

  • log.segment.bytes=104857600

If you have high-velocity data, you could end up with multiple 100 MB log files, if not, you might only have one. For lower velocity event data, the smaller the size you can set this property. For higher velocity event data, the larger you should set this property. If you set the size too small, Kafka will continually be rolling over files. If you set the size too large, Kafka will rarely roll over the log file to a new one, and old events will be kept in the queue longer than necessary.

Another setting that affects how much disk space is consumed by a Kafka topic partition is retention bytes. This property instructs Kafka to always keep a minimum amount of data. By default, the value for this property is 100 MB. So even if Kafka decides it can and should delete old data, the size of the remaining data will never be below 120 MB (100 MB for the old log file and 20 MB for the new log file). As with the segment bytes property above, if you are working with lower data velocities, you can lower the value for this property. When working with higher velocity data, the default 100 MB should be used.

  • log.retention.bytes=104857600

Topic partition file management

As subscribers consume events from a partition's queue, events will become stale when marked as consumed by all subscribers. The amount of time Kafka keeps old messages and the frequency that Kafka cleans out old messages can be set using the following properties.

  • log.retention.hours=1
  • log.retention.check.interval.ms=30000

The default retention policy is 1 hour. Any data files older than 1 hour and not currently storing active data will be deleted. If the file is still being actively used to store data (as might be the case with low volume/velocity data) it will not be deleted. Kafka will check to delete old data files every 30 seconds by default.

Topic partition file management properties

The properties below are also available in the Kafka properties file to control the rate the partition files are rolled over. Adjusting this property lower may improve disk space utilization, but also might impact performance if set too low. Increasing this property will increase disk space usage.

  • log.roll.ms=1800000
  • log.roll.jitter.ms=180000

The first property instructs Kafka to roll a data file over, essentially replace it with a new one, every 30 minutes. Kafka will create a new data file every 30 minutes regardless of the size of the old data file. For low-velocity data streams, this can prevent having to maintain older data if the data file is not filling up very often. The second property determines how persistent Kafka is about rolling the data file over. The recommended value is 3 minutes, meaning Kafka will check every 3 minutes to see if it needs to roll the data file over.