Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make exact phrase matching in elastic search?

I am trying to implement an exact match search in elastic search. But I am not getting the required results. Here is the code to explain the issue I am facing and things I tried.

doc1 = {"sentence": "Today is a sunny day."}
doc2 = {"sentence": " Today is a sunny day but tomorrow it might rain"}
doc3 = {"sentence": "I know I am awesome"}
doc4 = {"sentence": "The taste of your dish is awesome"}
doc5 = {"sentence": "The taste of banana shake is good"}

# Indexing the above docs

es.index(index="english",doc_type="sentences",id=1,body=doc1)

es.index(index="english",doc_type="sentences",id=2,body=doc2)

es.index(index="english",doc_type="sentences",id=3,body=doc3)

es.index(index="english",doc_type="sentences",id=4,body=doc4)

es.index(index="english",doc_type="sentences",id=5,body=doc5)

query 1

res = es.search(index="english",body={"from":0,"size":5,
                                  "query":
                                      {"match_phrase":
                                          {"sentence":{"query":"Today is a sunny day"}
                                          }},

                                          "explain":False})

query 2

 res = es.search(index="english",body={"from":0,"size":5,
                                  "query":{
                                    "bool":{
                                            "must":{
                                            "match_phrase":
                                          {"sentence":{"query":"Today is a sunny day"}
                                          }},
                                            "filter":{
                                                    "term":{
                                                            "sentence.word_count": 5}},

                                          }
                                            }
                                            })

So when I run query 1, I get doc2 as the top result, while I want doc1 to be the top result.

When I am trying to use filter for the same( to restrict the length of search to the length of query), as in query 2 , I am getting no result.

I will be really grateful if I can get any help on solving this. I want an exact match for the given query and not the match which contains that query.

Thanks

like image 585
Gaurav Chawla Avatar asked Oct 16 '22 13:10

Gaurav Chawla


2 Answers

My guts tell me that your index has 5 primary shards and you don't have enough documents for the scores to be relevant. If you create an index with a single primary shard, your first query will return the document you expect. You can read more about the reason why this happens in the following article: https://www.elastic.co/blog/practical-bm25-part-1-how-shards-affect-relevance-scoring-in-elasticsearch

One way to achieve what you want is by using the keyword type but with a normalizer to lowercase the data so it's easier to search for exact matches in a case insensitive way.

Create your index like this:

PUT english
{
  "settings": {
    "analysis": {
      "normalizer": {
        "lc_normalizer": {
          "type": "custom",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "sentences": {
      "properties": {
        "sentence": {
          "type": "text",
          "fields": {
            "exact": {
              "type": "keyword",
              "normalizer": "lc_normalizer"
            }
          }
        }
      }
    }
  }
}

Then you can index your documents as usual.

PUT english/sentences/1
{"sentence": "Today is a sunny day"}
PUT english/sentences/2
{"sentence": "Today is a sunny day but tomorrow it might rain"}
...

Finally you can search for an exact phrase match, the query below will only return doc1

POST english/_search
{
  "query": {
    "match": {
      "sentence.exact": "today is a sunny day"
    }
  }
}
like image 58
Val Avatar answered Oct 21 '22 06:10

Val


Try using a bool query

    PUT test_index/doc/1
    {"sentence": "Today is a sunny day"}

    PUT test_index/doc/2
    {"sentence": "Today is a sunny day but tomorrow it might rain"}

 -#terms query for exact match with keyword and multi match - phrase for other matches
    GET test_index/_search
    {
      "query": {
        "bool": {
          "should": [
            {
              "terms": {
                "sentence.keyword": [
                  "Today is a sunny day"
                ]
              }
            },
            {  
              "multi_match":{  
                "query":"Today is a sunny day",
                "type":"phrase",
                "fields":[  
                    "sentence"
                ]
              }
            }
          ]
        }
      }
    }

Another option use multi match for both with keyword match as first and boost of 5 and other matches with no boost:

PUT test_index/doc/1
{"sentence": "Today is a sunny day"}

PUT test_index/doc/2
{"sentence": "Today is a sunny day but tomorrow it might rain"}


GET test_index/_search
{  
  "query":{  
    "bool":{  
      "should":[  
        {  
          "multi_match":{  
            "query":"Today is a sunny day",
            "type":"phrase",
            "fields":[  
              "sentence.keyword"
            ],
            "boost":5
          }
        },
        {  
          "multi_match":{  
            "query":"Today is a sunny day",
            "type":"phrase",
            "fields":[  
                "sentence"
            ]
          }
        }
      ]
    }
  }
}
like image 21
Polynomial Proton Avatar answered Oct 21 '22 08:10

Polynomial Proton