Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch match phrase prefix not matching all terms

I am having an issue where when I use the match_phrase_prefix query in Elasticsearch, it is not returning all the results I would expect it to, particularly when the query is one word followed by one letter.

Take this index mapping (this is a contrived example to protect sensitive data):

http://localhost:9200/test/drinks/_mapping

returns:

{
  "test": {
    "mappings": {
      "drinks": {
        "properties": {
          "name": {
            "type": "text"
          }
        }
      }
    }
  }
}

And amongst millions of other records are these:

{
    "_index": "test",
    "_type": "drinks",
    "_id": "2",
    "_score": 1,
    "_source": {
        "name": "Johnnie Walker Black Label"
    }
},
{
    "_index": "test",
    "_type": "drinks",
    "_id": "1",
    "_score": 1,
    "_source": {
        "name": "Johnnie Walker Blue Label"
    }
}

The following query, which is one word followed by two letters:

POST http://localhost:9200/test/drinks/_search
{
    "query": {
        "match_phrase_prefix" : {
            "name" : "Walker Bl"
        }
    }
}

returns this:

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0.5753642,
        "hits": [
            {
                "_index": "test",
                "_type": "drinks",
                "_id": "2",
                "_score": 0.5753642,
                "_source": {
                    "name": "Johnnie Walker Black Label"
                }
           },
           {
               "_index": "test",
               "_type": "drinks",
               "_id": "1",
               "_score": 0.5753642,
               "_source": {
                   "name": "Johnnie Walker Blue Label"
                }
            }
        ]
    }
}

Whereas this query with one word and one letter:

POST http://localhost:9200/test/drinks/_search
{
    "query": {
        "match_phrase_prefix" : {
            "name" : "Walker B"
        }
    }
}

returns no results. What could be happening here?

like image 388
Paul T Davies Avatar asked Nov 08 '17 14:11

Paul T Davies


People also ask

What is the difference between match and match phrase in Elasticsearch?

Match phrase query is similar to the match query but is used to query text phrases. Phrase matching is necessary when the ordering of the words is important. Only the documents that contain the words in the same order as the search input are matched.

What is phrase prefix in Elasticsearch?

Match phrase prefix queryedit. Returns documents that contain the words of a provided text, in the same order as provided. The last term of the provided text is treated as a prefix, matching any words that begin with that term.

What is match phrase in Elasticsearch?

Match phrase queryedit A phrase query matches terms up to a configurable slop (which defaults to 0) in any order. Transposed terms have a slop of 2. The analyzer can be set to control which analyzer will perform the analysis process on the text.

What is prefix query?

Prefix query is used to find all the documents with a given prefix. Like term query, phrase query is a low-level query and doesn't take the field mapping into consideration (refer to the Analyzed versus non-analyzed section for more details).


1 Answers

I will assume that you are working with Elasticsearch 5.0 and above. I think it might have to be because of the max_expansions default value.

As seen in the documentation here, the max_expansions parameters is used to control how many prefixes the last term will be expanded with. The default value is 50 and it might explain why you find "black" and "blue" with the two first letters B and L, but not with the B only.

The documentation is pretty clear about it:

The match_phrase_prefix query is a poor-man’s autocomplete. It is very easy to use, which let’s you get started quickly with search-as-you-type but it’s results, which usually are good enough, can sometimes be confusing.

Consider the query string quick brown f. This query works by creating a phrase query out of quick and brown (i.e. the term quick must exist and must be followed by the term brown). Then it looks at the sorted term dictionary to find the first 50 terms that begin with f, and adds these terms to the phrase query.

The problem is that the first 50 terms may not include the term fox so the phase quick brown fox will not be found. This usually isn’t a problem as the user will continue to type more letters until the word they are looking for appears

I wouldn't be able to tell you if it's ok to increase this parameter above 50 if you are looking for good performances since I never tried myself.

like image 146
Rlarroque Avatar answered Oct 14 '22 15:10

Rlarroque