Skip To Content

Using Lucene search text queries

The Geoportal uses a sophisticated search engine that provides many search options, ranking options, fast performance, and extensibility. The search engine is based on the open source search engine Apache Lucene. For more information on how to specifically leverage Lucene search syntax for powerful searching in your Geoportal, see the Lucene website.

To make the most of the Geoportal search page, refer to the following sections for a list of features that Lucene provides for search syntax:

Terms

A query is broken up into terms and operators. There are two types of terms: single terms and phrases. A single term is a single word, such as air or quality. A phrase is a group of words surrounded by double quotation marks, such as "air quality". Multiple terms can be combined together with Boolean operators to form a more complex query. The following are examples of search terms:

  • Searching for air results in 35 hits (items that contain the word air).
  • Searching for quality results in 123 hits (items that contain the word quality).
  • Searching for air quality (without quotation marks) results in 148 hits (items that contain the words air or quality or both).
  • Searching for air AND quality results in 10 hits (items that contain both the words air and quality).
  • Searching for "air quality" (with quotation marks) results in 7 hits (items that contain the words air and quality directly after each other).
  • Searching for title:air results in 5 hits (items that contain the word air in the title).
  • Searching for title:quality results in 14 hits (items that contain the word quality in the title).
  • Searching for +title:air +title:quality or title:"air quality" results in 2 hits (items that contain both the words air and quality in the title).

Special characters

The Geoportal supports escaping special characters that are part of the query syntax. The following is a list of special characters and their escape codes:

Special characterEscape code

+

\+

-

\-

&&

\&\&

||

\|\|

!

\!

(

\(

)

\)

{

\{

}

\}

[

\[

]

\]

^

\^

"

\"

~

\~

*

\*

?

\?

:

\:

\

\\

For example, to search for items that contain the scale hit 1:250k, use the query 1\:250k.

Fields

Lucene supports fielded data. When performing a search, you can either specify a field or use the default field. The field names and default field are implementation specific. You can search any field by typing the field name followed by a colon and the term for which you are looking. Targeting a specific field in the query can be more accurate than only searching with terms. Keep in mind that some fields are case sensitive. Remember that certain special characters must be escaped in the query using a backslash (\) character or enclosed within quotation marks whenever they are a part of the search text. The following is a list of searches with fields:

  • title:"The Right Way" AND text:"don't go this way"
  • uuid:"{550E8400-E29B-41D4-A716-446655440000}"
  • uuid:\{550E8400\-E29B\-41D4\-A716\-446655440000\}
  • resource.url:"http://server.arcgisonline.com/ArcGIS/rest/services/ESRI_StreetMap_World_2D/MapServer"
Note:

The field is only valid for the term that it directly precedes, so the query title:Do it right will only find the word Do in the title field.

Wildcard searches

The Geoportal supports single- and multiple-character wildcard searches within single terms (not within phrase queries).

Caution:

You can't use an asterisk (*) or a question mark (?) as the first character of a search.

To perform a single-character wildcard search, use the question mark. The single-character wildcard search looks for terms that match the term with the single character replaced. For example, to search for text or test, you can use the search te?t.

To perform a multiple-character wildcard search, use the asterisk. Multiple-character wildcard searches look for 0 or more characters. For example, to search for test, tests, or tester, you can use the search test*. You can also use the wildcard searches in the middle of a term, for example, te*t.

Fuzzy searches

The Geoportal supports fuzzy searches based on the Levenshtein Distance or the Edit Distance algorithm. To do a fuzzy search, use the tilde (~) character at the end of a single term. For example, to search for a term similar in spelling to air, use the fuzzy search air~. This search will find items containing not only terms such as air and airplane, but also terms such as aid. The Geoportal supports specifying the required similarity. The value is between 0 and 1. When the value is closer to 1, only terms with a higher similarity will be matched, for example, air~0.8. The default value is 0.5 when a value is not specified.

Proximity searches

The Geoportal supports finding words that are a within a specific distance. To do a proximity search, use the tilde (~) character at the end of a phrase. For example, to search for air and quality within 10 words of each other in a document, use the search "air quality"~10.

Range searches

The Geoportal supports range queries for envelope and timestamp searches. This allows the user to match documents whose field values are between the lower and upper bounds specified by the range query. Range queries can be inclusive or exclusive of the upper and lower bounds.

Envelope searches

The syntax for an envelope search is the field name (envelope) followed by a colon (:) and either an inclusive range definition or an exclusive range definition. For inclusive ranges, enclose the spatial envelope in square brackets ([ ]), and for exclusive ranges, enclose the spatial envelope in braces ({ }). Exclusive range searches only select resources that fall exactly within the envelope range specified, while inclusive range searches select resources that intersect and fall outside of the range specified. The first pair of values are the lower left corner coordinates followed by the keyword TO (case sensitive), and then the upper right corner coordinates. Coordinates are always given in the WGS 1984 (4236) projection system. Wildcards can also be used in place of a single coordinate or an entire pair of corner coordinates, for example, envelope:[*,-70 TO +30,*] or envelope:{-80,-70 TO *}.

The following is a list of envelope search examples:

  • envelope:[-80,-70 TO +30,+70]

    This search would result in returned documents that intersect a spatial envelope with a southwest bounding coordinate of -80° W and -70° S, and a northeast bounding coordinate of 30° W and 70° N.

  • envelope:{-80,-70 TO +30,+70}

    This search would result in returned documents that fall exactly within the range of a spatial envelope with a southwest bounding coordinate of -80° W and -70° S, and a northeast bounding coordinate of 30° W and 70° N.

Timestamp searches

The syntax for a timestamp search is the field name (dateModified) followed by a colon (:), and then an inclusive range definition.

  • The following is a list of timestamp search examples:
    • dateModified:[2009-10-11 TO 2009-11-10]

      This search would result in returned resources that have a dateModified value between 2009-10-11 and 2009-11-10, including the specified dates.

    • dateModified:[2006 TO 2010]

      This search would result in returned resources that have a dateModified value between the years 2006 and 2010.

    • dateModified:2009-12

      This search would result in returned resources that have a dateModified value of December of 2009 (no brackets required).

Boosting a term

The Geoportal provides the relevance level of matching documents based on the terms found. To boost a term, use a caret (^) symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be. Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for air quality and you want the term air to be more relevant, boost it using the following search syntax: air^4 quality. This search would result in returned documents where the term air appears more relevant. You can also boost phrase terms using the following search syntax: "air quality"^4 "water quality". The default boost factor is 1. The boost factor can be less than one, but must be a positive number, for example, air^0.2 quality.

Boolean operators

Boolean operators allow terms to be combined through logic operators. The Geoportal supports the following Boolean operators:

  • The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used. The OR operator links two terms and finds a matching document if either of the terms exists in a document. This is equivalent to a union using sets. The || operator can also be used in place of the word OR.
  • The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The && operator can also be used in place of the word AND.
  • The + operator requires that the term after the + exist somewhere in a field of a single document.
  • The NOT operator excludes documents that contain the term after the word NOT. This is equivalent to a difference using sets. The exclamation point (!) can also be used in place of the word NOT.
    Note:

    The NOT operator can't be used with a single term.

Note:

Boolean operators are case sensitive.

Grouping

The Geoportal supports using parentheses to group clauses to form subqueries. This can be useful when you want to control the Boolean logic for a query. For example, (air OR water) AND quality will find documents containing the words air and quality or the words water and quality.

Field grouping

The Geoportal supports using parentheses to group multiple clauses to a single field. For example, title:(air OR water) finds items that contain the words air or water in the title.