Skip To Content

Watch a Folder for New CSV Files

The Watch a Folder for New CSV Files input connector can be used to read and adapt event data, formatted as delimited text, from a system file. The text delimiter is usually a comma, so this type of input file is sometimes referred to a comma separated values data file, but ArcGIS GeoEvent Server can use any ASCII printable character as a delimiter to separate data attribute values.

Often, data values are simple. Commas are used to separate, or delimit, individual attribute values and literal string values are enclosed in double quotation marks, as illustrated below.

Comma separated values with literal string values enclosed in double quotation marks

Sometimes using a delimiter other than a comma is useful to avoid ambiguity when double quotation marks or commas are embedded within an attribute value, such as when a data file includes JSON string representations to specify geometry values, for example. The use of semicolons as delimiters is illustrated below.

Semicolon separated values with JSON string representations to specify geometry values

Usage notes

Keep the following in mind when working with the Watch a Folder for New CSV Files input connector:

  • Use this input connector to read data, formatted as delimited text, from a system file and adapt it to create event data records for processing.
  • This input connector pairs the Text inbound adapter with the File inbound transport.
  • The input connector watches the specified system folder and will read an entire file as soon as the file appears in the folder.
  • The entire file’s content will be reread if changes are made to the file and are saved.
  • All files in a watched folder will be reread, from the beginning of the files, in the following situations:
    • The input connector’s parameters are updated and saved.
    • The input connector is stopped and restarted (or the ArcGIS GeoEvent Server service is restarted).
  • Delimited text does not have to contain data that represents a geometry.
  • The adapter supports the ability to construct a point geometry from x, y, and z attribute values.
  • The registered server folder, specified in the Input Folder Data Store parameter, can be specified using either an absolute path or UNC path. If a UNC path is used, the Windows service account running GeoEvent Server must have read/write permission to the folder.
  • It is recommended that you use absolute paths—for example, C:\GeoEvent\input—for the Input Folder Data Store parameter.
  • The Input Directory parameter allows a subfolder relative to the registered server folder to be specified.
  • The Include Subfolders parameter allows you to specify whether folders beneath the folder specified in the Input Folder Data Store parameter should be searched recursively. Often, organizing data with different schemas into different folders, and changing Include Subfolders from its default to disable recursive search, allows a more direct and simpler configuration of this input connector.
  • When a data file has one or more headers (for example, field names or attribute data types) that are not data values, specify the Number of Lines to Skip from Start of File value. When a data file is particularly large, reduce the Max Number of Lines per Batch value to help manage data retrieval by limiting the number of lines retrieved as the file’s content is retrieved. You can also set the Batch Flush Interval value to specify how many milliseconds to wait before the next batch of lines are retrieved from the file.
  • A Message Separator value and an Attribute Separator value are required to parse delimited text. The Message Separator value indicates the character that identifies the end of a data record. The default is \n (newline). The Attribute Separator value specifies the character used to separate one attribute value from another in a single line of text. The illustrations above show data that uses different characters as attribute separators. Each illustration, however, assumes that a newline is the natural message separator.
  • A single data file can contain different types of data, for example, light truck versus tractor trailer. If different lines of text represent event data from different types sensors or assets, the first attribute value of each line of text must identify the type of event record. The Incoming Data Contains GeoEvent Definition parameter specifies whether the connector should use the first attribute value as the name of the GeoEvent Definition to specify the data type and number of attribute values that follow. This is often a source of confusion; when this parameter is set to Yes (the default) and is coupled with a dependent parameter, Create Unrecognized Event Definitions is set to No (the default), and event data like that illustrated above is provided. No event records are created for processing. The reason for this is the first attribute of the illustrated event data is not the name of a GeoEvent Definition; it is an assets unique name/identifier. It is unlikely that GeoEvent Definitions exist whose names match the unique identifiers of every asset being monitored.

  • Consider the expected behavior if an input was configured with the default Incoming Data Contains GeoEvent Definition parameter set to Yes and the Create Unrecognized Event Definitions parameter was changed to Yes. A new GeoEvent Definition would be created for every named asset or sensor. This is not likely the result you would want, especially if the data contains hundreds, or thousands, of unique asset names. To prevent this from happening, review the data, and if each line does not start with the name of a GeoEvent Definition, change the Incoming Data Contains GeoEvent Definition parameter value to No.

  • Network latency can adversely impact the ability of GeoEvent Server to retrieve high volumes of event data.

Parameters

The following are the parameters for the Watch a Folder for New CSV Files input connector:

ParameterDescription

Name

A descriptive name for the input connector used for reference in GeoEvent Manager.

Input Folder Data Store

The registered system folder under which files will be found.

Input Directory

A subfolder directly under the registered system folder. Input Directory should be left blank if a subfolder under the registered system folder does not exist.

Input File Filter

A regular expression pattern used to identify files appropriate for this input to ingest and adapt to create event data records for processing. The default is .*\.csv, which matches any filename (.*) ending with the literal suffix (.csv).

While this parameter is not required and can be left blank, it is recommended you specify a pattern which matches the file name of any file whose schema matches the GeoEvent Definition this input has been configured to use and exclude files (by name) which you do not want the input to ingest.

Read File as Text Lines

Specifies how the content of the file should be read and parsed. The default is Yes.

  • Yes—The contents of the file will be read and parsed as individual lines of text.
  • No—The entire file will be read and parsed as a complete document.

When working with delimited text, it is recommended that you read the individual lines of text rather than read the entire file's content. It is assumed that each line of text represents a complete data record. Each line of text must end with a message separator.

Max Number of Lines per Batch

(Conditional)

The maximum number of lines to read from the file in each batch or interval. The default is 1000 lines. Reduce this value if each event record contains many attributes to limit the amount of data sent to the Text adapter as a batch.

The parameter is shown when Read File as Text Lines is set to Yes and is hidden when set to No.

Batch Flush Interval (milliseconds)

(Conditional)

The number of milliseconds to wait before reading another batch of lines from the file. The default is 500. Reduce this value if file size is expected to be very large or if additional time is necessary to process each batch of lines retrieved from a file.

The parameter is shown when Read File as Text Lines is set to Yes and is hidden when set to No.

Number of Lines to Skip from Start of File

(Conditional)

The number of lines to skip from the start of the file. The default is 0. Increase this value if you want a skip a specific number of lines—for example, header lines specifying attribute field names or data types, because they do not contain actual data for processing.

The parameter is shown when Read File as Text Lines is set to Yes and is hidden when set to No.

Default Spatial Reference

The well-known ID (WKID) of a spatial reference to be used when a geometry is constructed from attribute field values whose coordinates are not latitude and longitude values for an assumed WGS84 geographic coordinate system, or geometry strings are received that do not include a spatial reference. A well-known text (WKT) value or the name of an attribute field containing the WKID or WKT may also be specified.

Message Separator

A single literal character that indicates the end of an event data record. Unicode values may be used to specify a character delimiter. The character should not be enclosed in quotes. A newline (\n) is a common end-of-record delimiter.

Attribute Separator

A single literal character used to separate one attribute value from another in a message. Unicode values may be used to specify a character delimiter. The character should not be enclosed in quotes. A comma is a common attribute delimiter.

Incoming Data Contains GeoEvent Definition

Specifies whether the first attribute value of each delimited line of text should be used as the name of a GeoEvent Definition. For more information, see the usage notes above.

  • Yes—The first attribute field in each event record is the name of a GeoEvent Definition (existing or new).
  • No—All the event records share a common schema and therefore share one GeoEvent Definition. The first attribute field in each event record is sensor data, not the name of a GeoEvent Definition.

Create Unrecognized Event Definitions

(Conditional)

Specifies whether a new GeoEvent Definition should be created when one with the specified name does not exist. When a delimited text file includes event records from different types of sensors, the first attribute value is used to specify the type of event and this attribute value is used as the GeoEvent Definition name.

  • Yes—A new GeoEvent Definition will be created if an event definition with the specified name does not already exist.
  • No—A new GeoEvent Definition will not be created. Inbound event data that does not have a corresponding GeoEvent Definition cannot be adapted and will not be processed.

The parameter is shown when Incoming Data Contains GeoEvent Definition is set to Yes and is hidden when set to No.

Create GeoEvent Definition

(Conditional)

Specifies whether a new or existing GeoEvent Definition should be used for the inbound event data. A GeoEvent Definition is required for GeoEvent Server to understand the inbound event data attribute fields and data types.

  • Yes—A new GeoEvent Definition will be created based on the schema of the first event record received.
  • No—A new GeoEvent Definition will not be created. Select an existing GeoEvent Definition that matches the schema of the inbound event data.

The parameter is shown when Incoming Data Contains GeoEvent Definition is set to No and is hidden when set to Yes.

GeoEvent Definition Name (New)

(Conditional)

The name assigned to a new GeoEvent Definition. If a GeoEvent Definition with the specified name already exists, the existing GeoEvent Definition will be used. The first data record received will be used to determine the expected schema of subsequent data records, a new GeoEvent Definition will be created based on that first data record's schema.

The parameter is shown when Create GeoEvent Definition is set to Yes and is hidden when set to No.

GeoEvent Definition Name (Existing)

(Conditional)

The name of an existing GeoEvent Definition to use when adapting received data to create event data for processing by a GeoEvent Service.

The parameter is shown when Create GeoEvent Definition is set to No and is hidden when set to Yes.

Construct Geometry from Fields

Specifies whether the input connector should construct a point geometry using coordinate values received as attributes. The default is No.

  • Yes—Values from specified event attribute fields will be used to construct a point geometry.
  • No—A point geometry will not be constructed. It is assumed an attribute field contains a value that can be interpreted as a geometry or the event record is nonspatial (does not have a geometry).

X Geometry Field

(Conditional)

The attribute field in the inbound event data containing the x coordinate part (for example horizontal or longitude) of a point location.

The parameter is shown when Construct Geometry from Fields is set to Yes and is hidden when set to No.

Y Geometry Field

(Conditional)

The attribute field in the inbound event data containing the y coordinate part (for example vertical or latitude) of a point location.

The parameter is shown when Construct Geometry from Fields is set to Yes and is hidden when set to No.

Z Geometry Field

(Conditional)

The name of the field in the inbound event data containing the z coordinate part (for example depth or altitude) of a point location. If left blank, the z value will be omitted and a 2D point geometry will be constructed.

The parameter is shown when Construct Geometry from Fields is set to Yes and is hidden when set to No.

Expected Date Format

The pattern used to match expected string representations of date/time values and convert them to Java Date values. The pattern's format follows the Java SimpleDateFormat class convention.

While GeoEvent Server prefers date/time values to be expressed in the ISO 8601 standard, several string representations of date/time values commonly recognized as date values can be converted to Java Date values without specifying an Expected Date Format pattern. These include the following:

  • "2019-12-31T23:59:59"—The ISO 8601 standard format
  • 1577836799000—Java Date (epoch long integer; UTC)
  • "Tue Dec 31 23:59:59 -0000 2019"—A common web services string format
  • "12/31/2019 11:59:59 PM"—Common format used in the United States (12-hour clock)
  • "12/31/2019 23:59:59"—Common format used in the United States (24-hour clock)

If the date/time values received are expressed using a convention other than one of the five shown above, you will have to specify an expected date format pattern so GeoEvent Server knows how the date/time values should be adapted.

Language for Number Formatting

The locale identifier (ID) used for locale-sensitive behavior when formatting numbers from data values. The default is the locale of the machine GeoEvent Server is installed on. For more information, see Java Supported Locales.

Include Subfolders

Specifies whether subfolders under Input Folder Data Store and Input Directory (optional) are used for files. The default is Yes; however, organizing data with different schemas into different folders and changing this parameter to No, to disable recursive search, allows a simpler configuration.

  • Yes—Recursively search for files whose content will be ingested and adapted to create event data records.
  • No—Only the Input Folder Data Store and Input Directory (optional) subfolders will be searched for files.

Delete Files After Processing

Specifies whether the files in the registered system folder will be deleted after their content has been processed. Note that even if a file's content cannot be adapted, no event records are created, and no real-time event processing occurs, the inbound transport will still delete a file whose contents were successfully read. The default is No.

  • Yes—Files will be deleted from the registered system folder after being processed.
  • No—Files will not be deleted from the registered system folder after being processed.

Files not deleted will be reread, from the beginning of the file, if the input connector's properties are changed and saved or if the input is stopped and restarted, for example, if the ArcGIS GeoEvent Server service is restarted.


In diesem Thema
  1. Usage notes
  2. Parameters