Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch Search query is not case sensitive

I am trying to search query and it working fine for exact search but if user enter lowercase or uppercase it does not work as ElasticSearch is case insensitive.

example

{
    "query" : {
        "bool" : { 
            "should" : {
                "match_all" : {} 
            },
            "filter" : {
                "term" : { 
                    "city" : "pune"
                }
            }
        }
    }
}

it works fine when city is exactly "pune", if we change text to "PUNE" it does not work.

like image 323
Rohit Avatar asked Oct 10 '18 18:10

Rohit


People also ask

Is Elasticsearch search case sensitive?

Elasticsearch ECS Data Typesit is case sensitive, requires use of regex (which worth mentioning is not the full PCRE spec) aggregation max character length.

Does case matter in Elasticsearch?

ElasticSearch is case insensitive. "Elasticsearch" is not case-sensitive.

How do you write a term query as case insensitive?

The terms query does not currently support the case insensitive query option. To have case insensitive terms matching you would need to index with a normalizer.

What is the difference between match and term query in Elasticsearch?

To better search text fields, the match query also analyzes your provided search term before performing a search. This means the match query can search text fields for analyzed tokens rather than an exact term. The term query does not analyze the search term. The term query only searches for the exact term you provide.


1 Answers

ElasticSearch is case insensitive.

"Elasticsearch" is not case-sensitive. A JSON string property will be mapped as a text datatype by default (with a keyword datatype sub or multi field, which I'll explain shortly).

A text datatype has the notion of analysis associated with it; At index time, the string input is fed through an analysis chain, and the resulting terms are stored in an inverted index data structure for fast full-text search. With a text datatype where you haven't specified an analyzer, the default analyzer will be used, which is the Standard Analyzer. One of the components of the Standard Analyzer is the Lowercase token filter, which lowercases tokens (terms).

When it comes to querying Elasticsearch through the search API, there are a lot of different types of query to use, to fit pretty much any use case. One family of queries such as match, multi_match queries, are full-text queries. These types of queries perform analysis on the query input at search time, with the resulting terms compared to the terms stored in the inverted index. The analyzer used by default will be the Standard Analyzer as well.

Another family of queries such as term, terms, prefix queries, are term-level queries. These types of queries do not analyze the query input, so the query input as-is will be compared to the terms stored in the inverted index.

In your example, your term query on the "city" field does not find any matches when capitalized because it's searching against a text field whose input underwent analysis at index time. With the default mapping, this is where the keyword sub field could help. A keyword datatype does not undergo analysis (well, it has a type of analysis with normalizers), so can be used for exact matching, as well as sorting and aggregations. To use it, you would just need to target the "city.keyword" field. An alternative approach could also be to change the analyzer used by the "city" field to one that does not use the Lowercase token filter; taking this approach would require you to reindex all documents in the index.

like image 53
Russ Cam Avatar answered Oct 18 '22 03:10

Russ Cam