Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match string with minus character in elasticsearch

So in DB I have this entry:

Mark-Whalberg

When searching with term

Mark-Whalberg

I get not match.

Why? Is minus a special character what I understand? It symbolizes "exclude"?

The query is this:

{"query_string": {"query": 'Mark-Whalberg', "default_operator": "AND"}}

Searching everything else, like:

Mark
Whalberg
hlb
Mark Whalberg

returns a match.

Is this stored as two different pieces? How can I get a match when including the minus sign in the search term?

--------------EDIT--------------

This is the current query:

var fields = [
    "field1",
    "field2",
];

{"query_string":{"query": '*Mark-Whalberg*',"default_operator": "AND","fields": fields}};
like image 670
oderfla Avatar asked May 18 '17 09:05

oderfla


People also ask

How to perform an exact text search in Elasticsearch?

Performing an exact text search in Elasticsearch is a bit tricky. One of the recommended ways to search a field for text is to use a match query as shown below (searching for “Africa”). This search finds matches for “Africa” as expected. However, it also finds these matches. ...

How do I search for all matches in a string?

Just search using normal query syntax, and Elasticsearch will find all matches anywhere in a string. That's right, we did this all without requiring you to change your query syntax .

What Unicode characters are supported by Elasticsearch?

Elasticsearch uses Apache Lucene 's regular expression engine to parse these queries. Lucene’s regular expression engine supports all Unicode characters. However, the following characters are reserved as operators: Depending on the optional operators enabled, the following characters may also be reserved:

What characters are supported in regular expressions in Elasticsearch?

Elasticsearch supports regular expressions in the following queries: Elasticsearch uses Apache Lucene 's regular expression engine to parse these queries. Lucene’s regular expression engine supports all Unicode characters. However, the following characters are reserved as operators:


2 Answers

You have an analyzer configuration issue.

Let me explain that. When you defined your index in ElasticSearch, you didn't indicate any analyzer for the field. It means it's the Standard Analyzer that will apply.

According to the documentation :

Standard Analyzer

The standard analyzer is the default analyzer which is used if none is specified. It provides grammar based tokenization (based on the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29) and works well for most languages.

Also, to answer to your question :

Why? Is minus a special character what I understand? It symbolizes "exclude"?

For the Standard Analyzer, yes it is. It doesn't mean "exclude" but it is a special char that will be deleted after analysis.

From documentation :

Why doesn’t the term query match my document?

[...] There are many ways to analyze text: the default standard analyzer drops most punctuation, breaks up text into individual words, and lower cases them. For instance, the standard analyzer would turn the string “Quick Brown Fox!” into the terms [quick, brown, fox]. [...]

Example :

If you have the following text :

"The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."

Then the Standard Analyzer will produce :

[ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone ]

If you don't want to use the analyzer you have 2 solutions :

  • You can use match query.
  • You can ask ElasticSearch to not analyze the field when you create your index : here's how

I hope this will help you.

like image 182
Mickael Avatar answered Sep 28 '22 09:09

Mickael


I've stuck in same question and the answer from @Mickael was perfect to understand what is going on (I really recommend you to read the linked documentation).

I solve this by defining an operator to the query:

GET http://localhost:9200/creative/_search

{  
  "query": {
    "match": {
      "keyword_id": {
        "query": "fake-keyword-uuid-3",
        "operator": "AND"
       }
    }
  }
}

For better understand the algorithm that this query uses, try to add "explain": true and analyse the results:

GET http://localhost:9200/creative/_search

{  
  "explain": true,
  "query": // ...
}
like image 24
Abe Avatar answered Sep 28 '22 07:09

Abe