So in DB I have this entry:
Mark-Whalberg
When searching with term
Mark-Whalberg
I get not match.
Why? Is minus a special character what I understand? It symbolizes "exclude"?
The query is this:
{"query_string": {"query": 'Mark-Whalberg', "default_operator": "AND"}}
Searching everything else, like:
Mark
Whalberg
hlb
Mark Whalberg
returns a match.
Is this stored as two different pieces? How can I get a match when including the minus sign in the search term?
--------------EDIT--------------
This is the current query:
var fields = [
"field1",
"field2",
];
{"query_string":{"query": '*Mark-Whalberg*',"default_operator": "AND","fields": fields}};
Performing an exact text search in Elasticsearch is a bit tricky. One of the recommended ways to search a field for text is to use a match query as shown below (searching for “Africa”). This search finds matches for “Africa” as expected. However, it also finds these matches. ...
Just search using normal query syntax, and Elasticsearch will find all matches anywhere in a string. That's right, we did this all without requiring you to change your query syntax .
Elasticsearch uses Apache Lucene 's regular expression engine to parse these queries. Lucene’s regular expression engine supports all Unicode characters. However, the following characters are reserved as operators: Depending on the optional operators enabled, the following characters may also be reserved:
Elasticsearch supports regular expressions in the following queries: Elasticsearch uses Apache Lucene 's regular expression engine to parse these queries. Lucene’s regular expression engine supports all Unicode characters. However, the following characters are reserved as operators:
Let me explain that. When you defined your index in ElasticSearch, you didn't indicate any analyzer for the field. It means it's the Standard Analyzer that will apply.
According to the documentation :
Standard Analyzer
The standard analyzer is the default analyzer which is used if none is specified. It provides grammar based tokenization (based on the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29) and works well for most languages.
Also, to answer to your question :
Why? Is minus a special character what I understand? It symbolizes "exclude"?
For the Standard Analyzer, yes it is. It doesn't mean "exclude" but it is a special char that will be deleted after analysis.
From documentation :
Why doesn’t the term query match my document?
[...] There are many ways to analyze text: the default standard analyzer drops most punctuation, breaks up text into individual words, and lower cases them. For instance, the standard analyzer would turn the string “Quick Brown Fox!” into the terms [quick, brown, fox]. [...]
Example :
If you have the following text :
"The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
Then the Standard Analyzer will produce :
[ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone ]
If you don't want to use the analyzer you have 2 solutions :
I hope this will help you.
I've stuck in same question and the answer from @Mickael was perfect to understand what is going on (I really recommend you to read the linked documentation).
I solve this by defining an operator
to the query:
GET http://localhost:9200/creative/_search
{
"query": {
"match": {
"keyword_id": {
"query": "fake-keyword-uuid-3",
"operator": "AND"
}
}
}
}
For better understand the algorithm that this query uses, try to add "explain": true
and analyse the results:
GET http://localhost:9200/creative/_search
{
"explain": true,
"query": // ...
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With