Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic Search match phrase query -> output not predictable

Sample doc

{
  "id": 5,
  "title": "Quick Brown fox jumps over the lazy dog",
  "genre": [
    "fiction"
  ]
}

Mapping

{
  "movies" : {
    "mappings" : {
      "properties" : {
        "genre" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "id" : {
          "type" : "long"
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

Query1: Results in the document shared earlier

{
 "query": {
   "match_phrase": {
     "title": {
       "query": "fox quick over", "slop": 3
     }
   }
 } 
}

Query2: No Results

{
 "query": {
   "match_phrase": {
     "title": {
       "query": "over fox quick", "slop": 3
     }
   }
 } 
}

I was expecting a result in query2 rather than in query 1.

like image 413
Sahil Gupta Avatar asked Mar 29 '20 06:03

Sahil Gupta


2 Answers

Slop

Number of times you need to move a term in order to make the query and document match.

Switching word order requires two edits/steps

Below is movement of words

Query 1:

            Pos 1         Pos 2         Pos 3     Pos 4     Pos 5   Pos 6  Pos 7   Pos 8
--------------------------------------------------------------------------------------
Doc:        quick         brown         fox       jumps     over    the   lazy    dog
---------------------------------------------------------------------------------------
Query:                                  fox       quick     over
Slop 1:                                 fox|quick           over                                       
Slop 2:                   quick         fox                 over
Slop 3:    quick                        fox                 over

total steps 3

Query 2:

            Pos 1         Pos 2         Pos 3     Pos 4   Pos 5   Pos 6  Pos 7   Pos 8
--------------------------------------------------------------------------------------
Doc:        quick         brown         fox       jumps    over    the   lazy    dog
---------------------------------------------------------------------------------------
Query:                    over          fox       quick
Slop 1:                   over          fox|quick            
Slop 2:                   quick|over    fox           
Slop 3:     quick         over          fox       
Slop 4:     quick                       over|fox      
Slop 5:     quick                       fox       over
Slop 6:     quick                       fox               over

Total steps 6

like image 110
jaspreet chahal Avatar answered Oct 17 '22 21:10

jaspreet chahal


So, I reproduced the issue, with the mapping you provided and was able to troubleshoot the issue, with the help of Explain API and this article on slop in match_phrase queries.

So your second query gives result when minimum slop of 6 is given as shown in my search result.

Search query

{
 "query": {
   "match_phrase": {
     "title": {
       "query": "over fox quick", "slop": 6 --> note 6
     }
   }
 } 
}

Similarly, you need to give a minimum slop of 3 to bring the search result from your first query.

Basically slop value means, allowable deviation of the configurable term.

Example:- your doc contains Quick Brown fox jumps over the lazy dog.

Quick
Brown
fox
jumps
over
the
lazy 
dog

And if you are searching for fox quick over as a phrase, they all need to come together, for that you need to rearrange the tokens mentioned above.

Minimum replacement required is 3 as shown following:

fox and over no need to change anything, as they are already in order and quick needs to make 3 replacement, in order to come to its correct position.

Using the same method you can figure out why six slop is required in your second query to work.

like image 1
Amit Avatar answered Oct 17 '22 20:10

Amit