I have the following ElasticSearch query which I would think would return all matches on the email field where it equals [email protected]
"query": {
"bool": {
"must": [
{
"match": {
"email": "[email protected]"
}
}
]
}
}
The mapping for the user type that is being searched is the following:
{
"users": {
"mappings": {
"user": {
"properties": {
"email": {
"type": "string"
},
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"nickname": {
"type": "string"
},
}
}
}
}
}
The following is a sample of results returned from ElasticSearch
[{
"_index": "users",
"_type": "user",
"_id": "54b19c417dcc4fe40d728e2c",
"_score": 0.23983537,
"_source": {
"email": "[email protected]",
"name": "John Smith",
"nickname": "jsmith",
},
{
"_index": "users",
"_type": "user",
"_id": "9c417dcc4fe40d728e2c54b1",
"_score": 0.23983537,
"_source": {
"email": "[email protected]",
"name": "Walter White",
"nickname": "wwhite",
},
{
"_index": "users",
"_type": "user",
"_id": "4fe40d728e2c54b19c417dcc",
"_score": 0.23983537,
"_source": {
"email": "[email protected]",
"name": "Jimmy Fallon",
"nickname": "jfallon",
}]
From the above query, I would think this would need to have an exact match with '[email protected]' as the email property value.
How does the ElasticSearch DSL query need to change in order to only return exact matches on email.
The match query analyzes any provided text before performing a search. This means the match query can search text fields for analyzed tokens rather than an exact term. (Optional, string) Analyzer used to convert the text in the query value into tokens. Defaults to the index-time analyzer mapped for the <field> .
Minimum Should Match is another search technique that allows you to conduct a more controlled search on related or co-occurring topics by specifying the number of search terms or phrases in the query that should occur within the records returned.
Match phrase queryedit A phrase query matches terms up to a configurable slop (which defaults to 0) in any order. Transposed terms have a slop of 2. The analyzer can be set to control which analyzer will perform the analysis process on the text.
The email field got tokenized , which is the reason for this anomaly. So what happened is when you indexed
"[email protected]" => [ "myemail" , "gmail.com" ]
This way if you search for myemail OR gmail.com you will get the match right. SO what happens is , when you search for [email protected] , the analyzer is also applied on search query. Hence its gets broken into
"[email protected]" => [ "john" , "gmail.com" ]
here as "gmail.com" token is common in search term and indexed term , you will get a match.
To over ride this behavior , declare the email; field as not_analyzed. There by the tokenization wont happen and the entire string will get indexed as such.
With "not_analyzed"
"[email protected]" => [ "[email protected]" ]
So modify the mapping to this and you should be good -
{
"users": {
"mappings": {
"user": {
"properties": {
"email": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"nickname": {
"type": "string"
}
}
}
}
}
}
I have described the problem more precisely and another approach to solve it here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With