Skip To Content

Watch a Folder for New CSV Files

The Watch a Folder for New CSV Files input connector can be used to read and adapt event data, formatted as delimited text, from a system file. The text delimiter is usually a comma, so this type of input file is sometimes referred to a comma separated values data file, but ArcGIS GeoEvent Server can use any ASCII printable character as a delimiter to separate data attribute values.

Often, data values are simple. Commas are used to separate, or delimit, individual attribute values and literal string values are enclosed in double quotation marks, as illustrated below.

Comma separated values with literal string values enclosed in double quotation marks

Sometimes using a delimiter other than a comma is useful to avoid ambiguity when double quotation marks or commas are embedded within an attribute value, such as when a data file includes JSON string representations to specify geometry values, for example. The use of semicolons as delimiters is illustrated below.

Semicolon separated values with JSON string representations to specify geometry values

Usage notes

Keep the following in mind when working with the Watch a Folder for New CSV Files input connector:

  • Use this input connector to read data, formatted as delimited text, from a system file and adapt it to create event data records for processing.
  • This input connector pairs the Text inbound adapter with the File inbound transport.
  • The input connector watches the specified system folder and will read an entire file as soon as the file appears in the folder.
  • The entire file’s content will be reread if changes are made to the file and are saved.
  • All files in a watched folder will be reread, from the beginning of the files, in the following situations:
    • The input connector’s parameters are updated and saved.
    • The input connector is stopped and restarted (or the ArcGIS GeoEvent Server service is restarted).
  • Delimited text does not have to contain data that represents a geometry.
  • The adapter supports the ability to construct a point geometry from x, y, and z attribute values.
  • The registered server folder, specified in the Input Folder Data Store parameter, can be specified using either an absolute path or UNC path. If a UNC path is used, the Windows service account running GeoEvent Server must have read/write permission to the folder.
  • It is recommended that you use absolute paths—for example, C:\GeoEvent\input—for the Input Folder Data Store parameter.
  • The Input Directory parameter allows a subfolder relative to the registered server folder to be specified.
  • The Include Subfolders parameter allows you to specify whether folders beneath the folder specified in the Input Folder Data Store parameter should be searched recursively. Often, organizing data with different schemas into different folders, and changing Include Subfolders from its default to disable recursive search, allows a more direct and simpler configuration of this input connector.
  • When a data file has one or more headers (for example, field names or attribute data types) that are not data values, specify the Number of Lines to Skip from Start of File value. When a data file is particularly large, reduce the Max Number of Lines per Batch value to help manage data retrieval by limiting the number of lines retrieved as the file’s content is retrieved. You can also set the Batch Flush Interval value to specify how many milliseconds to wait before the next batch of lines are retrieved from the file.
  • A Message Separator value and an Attribute Separator value are required to parse delimited text. The Message Separator value indicates the character that identifies the end of a data record. The default is \n (newline). The Attribute Separator value specifies the character used to separate one attribute value from another in a single line of text. The illustrations above show data that uses different characters as attribute separators. Each illustration, however, assumes that a newline is the natural message separator.
  • A single data file can contain different types of data, for example, light truck versus tractor trailer. If different lines of text represent event data from different types sensors or assets, the first attribute value of each line of text must identify the type of event record. The Incoming Data Contains GeoEvent Definition parameter specifies whether the connector should use the first attribute value as the name of the GeoEvent Definition to specify the data type and number of attribute values that follow. This is often a source of confusion; when this parameter is set to Yes (the default) and is coupled with a dependent parameter, Create Unrecognized Event Definitions is set to No (the default), and event data like that illustrated above is provided. No event records are created for processing. The reason for this is the first attribute of the illustrated event data is not the name of a GeoEvent Definition; it is an assets unique name/identifier. It is unlikely that GeoEvent Definitions exist whose names match the unique identifiers of every asset being monitored.

  • Consider the expected behavior if an input was configured with the default Incoming Data Contains GeoEvent Definition parameter set to Yes and the Create Unrecognized Event Definitions parameter was changed to Yes. A new GeoEvent Definition would be created for every named asset or sensor. This is not likely the result you would want, especially if the data contains hundreds, or thousands, of unique asset names. To prevent this from happening, review the data, and if each line does not start with the name of a GeoEvent Definition, change the Incoming Data Contains GeoEvent Definition parameter value to No.

  • Network latency can adversely impact the ability of GeoEvent Server to retrieve high volumes of event data.

Parameters

The following are the parameters for the Watch a Folder for New CSV Files input connector:

ParameterDescription

Name

A descriptive name for the input connector used for reference in GeoEvent Manager.

Input Folder Data Store

The registered system folder under which files reside.

Input Directory

A subfolder directly under the registered system folder. Input Directory should be left blank if no subfolder under the registered system folder exists.

Input File Filter

A regular expression pattern used to identify files appropriate for this input to ingest and adapt to create event data records for processing. The default is .*\.csv, which matches any file name (.*) ending with the literal suffix (.csv).

While this parameter is not required and can be left blank, it is recommended that you specify a pattern that matches the file name of any file whose schema matches the GeoEvent Definition this input has been configured to use and exclude files (by name) that you do not want the input to ingest.

Read File as Text Lines

Specifies how the content of the file is read and parsed. The default is Yes.

  • Yes—The contents of the file are read and parsed as individual lines of text.
  • No—The entire file is read and parsed as a complete document.

When working with delimited text, it is recommended that you read the individual lines of text rather than read the entire file's content. It is assumed that each line of text represents a complete data record. Each line of text must end with a message separator.

Max Number of Lines per Batch

(Conditional)

The maximum number of lines that are read from the file in each batch or interval. The default is 1000 lines. Reduce this value if each event record contains many attributes to limit the amount of data sent to the Text adapter as a batch.

This parameter is shown when Read File as Text Lines is set to Yes and is hidden when it's set to No.

Batch Flush Interval (milliseconds)

(Conditional)

The number of milliseconds to wait before reading another batch of lines from the file. The default is 500. Reduce this value if file size is expected to be very large or if additional time is necessary to process each batch of lines retrieved from a file.

This parameter is shown when Read File as Text Lines is set to Yes and is hidden when it's set to No.

Number of Lines to Skip from Start of File

(Conditional)

The number of lines that are skipped from the start of the file. The default is 0. Increase this value to skip a specific number of lines—for example, header lines specifying attribute field names or data types, because they do not contain actual data for processing.

This parameter is shown when Read File as Text Lines is set to Yes and is hidden when it's set to No.

Default Spatial Reference

The well-known ID (WKID) of a spatial reference to be used when a geometry is constructed from attribute field values whose coordinates are not latitude and longitude values for an assumed WGS84 geographic coordinate system, or when geometry strings are received that do not include a spatial reference. A well-known text (WKT) value or the name of an attribute field containing the WKID or WKT can also be specified.

Message Separator

A single literal character that indicates the end of an event data record. Unicode values can be used to specify a character delimiter. Do not enclose the character in quotes. A newline (\n) is a common end-of-record delimiter.

Attribute Separator

A single literal character used to separate one attribute value from another in a message. Unicode values can be used to specify a character delimiter. Do not enclose the character in quotes. A comma is a common attribute delimiter.

Incoming Data Contains GeoEvent Definition

Specifies whether the first attribute value of each delimited line of text is used as the name of a GeoEvent Definition. For more information, refer to the usage notes above.

  • Yes—The first attribute field in each event record is the name of a GeoEvent Definition (existing or new).
  • No—All the event records share a common schema and share one GeoEvent Definition. The first attribute field in each event record is sensor data, not the name of a GeoEvent Definition.

Create Unrecognized Event Definitions

(Conditional)

Specifies whether a new GeoEvent Definition is created when one with the specified name does not exist. When a delimited text file includes event records from different types of sensors, the first attribute value is used to specify the type of event, and this attribute value is used as the GeoEvent Definition name.

  • Yes—A new GeoEvent Definition is created if an event definition with the specified name does not already exist.
  • No—No new GeoEvent Definition is created. Inbound event data that does not have a corresponding GeoEvent Definition cannot be adapted and is not processed.

This parameter is shown when Incoming Data Contains GeoEvent Definition is set to Yes and is hidden when it's set to No.

Create GeoEvent Definition

(Conditional)

Specifies whether a new or an existing GeoEvent Definition is used for the inbound event data. A GeoEvent Definition is required for GeoEvent Server to interpret the inbound event data attribute fields and data types.

  • Yes—A new GeoEvent Definition is created based on the schema of the first event record received.
  • No—No GeoEvent Definition is created. Choose an existing GeoEvent Definition that matches the schema of the inbound event data.

This parameter is shown when Incoming Data Contains GeoEvent Definition is set to No and is hidden when it's set to Yes.

GeoEvent Definition Name (New)

(Conditional)

The name assigned to a new GeoEvent Definition. If a GeoEvent Definition with the specified name already exists, the existing GeoEvent Definition is used. The first data record received is used to determine the expected schema of subsequent data records, and a new GeoEvent Definition is created based on that first data record's schema.

This parameter is shown when Create GeoEvent Definition is set to Yes and is hidden when it's set to No.

GeoEvent Definition Name (Existing)

(Conditional)

The name of an existing GeoEvent Definition to use when adapting received data to create event data for processing by a GeoEvent Service.

This parameter is shown when Create GeoEvent Definition is set to No and is hidden when it's set to Yes.

Construct Geometry from Fields

Specifies whether the input connector will construct a point geometry using coordinate values received as attributes. The default is No.

  • Yes—Values from specified event attribute fields are used to construct a point geometry.
  • No—No point geometry is constructed. It is assumed that an attribute field contains a value that can be interpreted as a geometry or the event record is nonspatial (does not have a geometry).

X Geometry Field

(Conditional)

The attribute field in the inbound event data containing the x-coordinate part (for example, horizontal or longitude) of a point location.

This parameter is shown when Construct Geometry from Fields is set to Yes and is hidden when it's set to No.

Y Geometry Field

(Conditional)

The attribute field in the inbound event data containing the y-coordinate part (for example, vertical or latitude) of a point location.

This parameter is shown when Construct Geometry from Fields is set to Yes and is hidden when it's set to No.

Z Geometry Field

(Conditional)

The attribute field in the inbound event data containing the z-coordinate part (for example, depth or altitude) of a point location. If no value is provided, the z-value is omitted, and a 2D point geometry is constructed.

This parameter is shown when Construct Geometry from Fields is set to Yes and is hidden when it's set to No.

Expected Date Format

The pattern used to match expected string representations of date/time values and convert them to Java Date values. The pattern's format follows the Java SimpleDateFormat class convention.

While the preferred pattern for date/time values in GeoEvent Server is the ISO 8601 standard, several string representations of date/time values commonly recognized as date values can be converted to Java Date values without specifying an Expected Date Format value. These include the following:

  • "2019-12-31T23:59:59"—The ISO 8601 standard format
  • 1577836799000—Java Date (epoch long integer; UTC)
  • "Tue Dec 31 23:59:59 -0000 2019"—A common web services string format
  • "12/31/2019 11:59:59 PM"—A common format used in the United States (12-hour clock)
  • "12/31/2019 23:59:59"—A common format used in the United States (24-hour clock)

If the date/time values received use a convention other than one of those listed above, you must specify an expected date format pattern so GeoEvent Server can adapt the date/time values.

Language for Number Formatting

The locale identifier (ID) used for locale-sensitive behavior when formatting numbers from data values. The default is the locale of the machine GeoEvent Server is installed on. For more information, refer to Java Supported Locales.

Include Subfolders

Specifies whether subfolders under Input Folder Data Store and Input Directory (optional) are used for files. The default is Yes; however, organizing data with different schemas into different folders and changing this parameter to No to disable recursive search allows a simpler configuration.

  • Yes—Recursively search for files whose content is ingested and adapted to create event data records.
  • No—Only the Input Folder Data Store and Input Directory (optional) subfolders are searched for files.

Delete Files After Processing

Specifies whether the files in the registered system folder are deleted after their content has been processed. Even if a file's content cannot be adapted, no event records are created, and no real-time event processing occurs, the inbound transport still deletes a file whose contents were successfully read. The default is No.

  • Yes—Files are deleted from the registered system folder after being processed.
  • No—Files are not deleted from the registered system folder after being processed.

Files that are not deleted are reread from the beginning of the file if the input connector's properties are changed and saved or if the input is stopped and restarted, for example, if the ArcGIS GeoEvent Server service is restarted.


In this topic
  1. Usage notes
  2. Parameters