Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is ElasticSearch match query returning all results?

I have the following ElasticSearch query which I would think would return all matches on the email field where it equals [email protected]

"query": {
  "bool": {
    "must": [
      {
        "match": {
          "email": "[email protected]"
      }
    }
  ]
}

}

The mapping for the user type that is being searched is the following:

    {
      "users": {
      "mappings": {
         "user": {
            "properties": {
               "email": {
                  "type": "string"
               },
               "name": {
                  "type": "string",
                  "fields": {
                     "raw": {
                        "type": "string",
                        "index": "not_analyzed"
                     }
                  }
               },
               "nickname": {
                  "type": "string"
               },
            }
         }
       }
   }  
     }

The following is a sample of results returned from ElasticSearch

 [{
    "_index": "users",
    "_type": "user",
    "_id": "54b19c417dcc4fe40d728e2c",
    "_score": 0.23983537,
    "_source": {
    "email": "[email protected]",
    "name": "John Smith",
    "nickname": "jsmith",
 },
 {
    "_index": "users",
    "_type": "user",
    "_id": "9c417dcc4fe40d728e2c54b1",
    "_score": 0.23983537,
    "_source": {
       "email": "[email protected]",
       "name": "Walter White",
       "nickname": "wwhite",
 },
 {
    "_index": "users",
    "_type": "user",
    "_id": "4fe40d728e2c54b19c417dcc",
    "_score": 0.23983537,
    "_source": {
       "email": "[email protected]",
       "name": "Jimmy Fallon",
       "nickname": "jfallon",
}]

From the above query, I would think this would need to have an exact match with '[email protected]' as the email property value.

How does the ElasticSearch DSL query need to change in order to only return exact matches on email.

like image 891
TheJediCowboy Avatar asked Jan 12 '15 04:01

TheJediCowboy


People also ask

How does Elasticsearch match query work?

The match query analyzes any provided text before performing a search. This means the match query can search text fields for analyzed tokens rather than an exact term. (Optional, string) Analyzer used to convert the text in the query value into tokens. Defaults to the index-time analyzer mapped for the <field> .

What is minimum should match Elasticsearch?

Minimum Should Match is another search technique that allows you to conduct a more controlled search on related or co-occurring topics by specifying the number of search terms or phrases in the query that should occur within the records returned.

What is match phrase in Elasticsearch?

Match phrase queryedit A phrase query matches terms up to a configurable slop (which defaults to 0) in any order. Transposed terms have a slop of 2. The analyzer can be set to control which analyzer will perform the analysis process on the text.


1 Answers

The email field got tokenized , which is the reason for this anomaly. So what happened is when you indexed

"[email protected]" => [ "myemail" , "gmail.com" ]

This way if you search for myemail OR gmail.com you will get the match right. SO what happens is , when you search for [email protected] , the analyzer is also applied on search query. Hence its gets broken into

"[email protected]" => [ "john" , "gmail.com" ]

here as "gmail.com" token is common in search term and indexed term , you will get a match.

To over ride this behavior , declare the email; field as not_analyzed. There by the tokenization wont happen and the entire string will get indexed as such.

With "not_analyzed"

"[email protected]" => [ "[email protected]" ]

So modify the mapping to this and you should be good -

{
  "users": {
    "mappings": {
      "user": {
        "properties": {
          "email": {
            "type": "string",
            "index": "not_analyzed"
          },
          "name": {
            "type": "string",
            "fields": {
              "raw": {
                "type": "string",
                "index": "not_analyzed"
              }
            }
          },
          "nickname": {
            "type": "string"
          }
        }
      }
    }
  }
}

I have described the problem more precisely and another approach to solve it here.

like image 108
Vineeth Mohan Avatar answered Nov 17 '22 00:11

Vineeth Mohan