I am trying to implement an exact match search in elastic search. But I am not getting the required results. Here is the code to explain the issue I am facing and things I tried.
doc1 = {"sentence": "Today is a sunny day."}
doc2 = {"sentence": " Today is a sunny day but tomorrow it might rain"}
doc3 = {"sentence": "I know I am awesome"}
doc4 = {"sentence": "The taste of your dish is awesome"}
doc5 = {"sentence": "The taste of banana shake is good"}
# Indexing the above docs
es.index(index="english",doc_type="sentences",id=1,body=doc1)
es.index(index="english",doc_type="sentences",id=2,body=doc2)
es.index(index="english",doc_type="sentences",id=3,body=doc3)
es.index(index="english",doc_type="sentences",id=4,body=doc4)
es.index(index="english",doc_type="sentences",id=5,body=doc5)
query 1
res = es.search(index="english",body={"from":0,"size":5,
"query":
{"match_phrase":
{"sentence":{"query":"Today is a sunny day"}
}},
"explain":False})
query 2
res = es.search(index="english",body={"from":0,"size":5,
"query":{
"bool":{
"must":{
"match_phrase":
{"sentence":{"query":"Today is a sunny day"}
}},
"filter":{
"term":{
"sentence.word_count": 5}},
}
}
})
So when I run query 1, I get doc2 as the top result, while I want doc1 to be the top result.
When I am trying to use filter for the same( to restrict the length of search to the length of query), as in query 2 , I am getting no result.
I will be really grateful if I can get any help on solving this. I want an exact match for the given query and not the match which contains that query.
Thanks
My guts tell me that your index has 5 primary shards and you don't have enough documents for the scores to be relevant. If you create an index with a single primary shard, your first query will return the document you expect. You can read more about the reason why this happens in the following article: https://www.elastic.co/blog/practical-bm25-part-1-how-shards-affect-relevance-scoring-in-elasticsearch
One way to achieve what you want is by using the keyword
type but with a normalizer
to lowercase the data so it's easier to search for exact matches in a case insensitive way.
Create your index like this:
PUT english
{
"settings": {
"analysis": {
"normalizer": {
"lc_normalizer": {
"type": "custom",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"sentences": {
"properties": {
"sentence": {
"type": "text",
"fields": {
"exact": {
"type": "keyword",
"normalizer": "lc_normalizer"
}
}
}
}
}
}
}
Then you can index your documents as usual.
PUT english/sentences/1
{"sentence": "Today is a sunny day"}
PUT english/sentences/2
{"sentence": "Today is a sunny day but tomorrow it might rain"}
...
Finally you can search for an exact phrase match, the query below will only return doc1
POST english/_search
{
"query": {
"match": {
"sentence.exact": "today is a sunny day"
}
}
}
Try using a bool query
PUT test_index/doc/1
{"sentence": "Today is a sunny day"}
PUT test_index/doc/2
{"sentence": "Today is a sunny day but tomorrow it might rain"}
-#terms query for exact match with keyword and multi match - phrase for other matches
GET test_index/_search
{
"query": {
"bool": {
"should": [
{
"terms": {
"sentence.keyword": [
"Today is a sunny day"
]
}
},
{
"multi_match":{
"query":"Today is a sunny day",
"type":"phrase",
"fields":[
"sentence"
]
}
}
]
}
}
}
Another option use multi match for both with keyword match as first and boost of 5 and other matches with no boost:
PUT test_index/doc/1
{"sentence": "Today is a sunny day"}
PUT test_index/doc/2
{"sentence": "Today is a sunny day but tomorrow it might rain"}
GET test_index/_search
{
"query":{
"bool":{
"should":[
{
"multi_match":{
"query":"Today is a sunny day",
"type":"phrase",
"fields":[
"sentence.keyword"
],
"boost":5
}
},
{
"multi_match":{
"query":"Today is a sunny day",
"type":"phrase",
"fields":[
"sentence"
]
}
}
]
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With