I'm running a multi_match (with most_fields and "fuzziness": "AUTO") query for "Rob", but I get a result with "Ron" before "Rob".
If I remove the fuzziness, it shows Rob only, not Ron. However, I do want to use the fuzziness, I just expect all results that are exact match to be more relevant and to be shown first. It's not happening. Investigating the 'explain', shows that the IDF of 'Ron' is a bit higher.
Back to my question - is it possible to configure some 'boost' or 'score' to the fuzziness element?
In Elasticsearch, fuzzy query means the terms are not the exact matches of the index. The result is 2, but you can use fuzziness to find the correct word for a typo in Elasticsearch's fuzzy in Match Query. For 6 characters, the Elasticsearch by default will allow 2 edit distance.
To find similar terms, the fuzzy query creates a set of all possible variations, or expansions, of the search term within a specified edit distance. The query then returns exact matches for each expansion.
The default scoring algorithm used by Elasticsearch is BM25. There are three main factors that determine a document's score: Term frequency (TF) — The more times that a search term appears in the field we are searching in a document, the more relevant that document is.
Many search engines enable users to specifically request a fuzzy search in the search query by using a tilde (~) at the end of the word or term they want to search with fuzziness.
The score represents how relevant a given document is for a specific query. The default scoring algorithm used by Elasticsearch is BM25. There are three main factors that determine a document’s score: Term frequency (TF) — The more times that a search term appears in the field we are searching in a document, the more relevant that document is.
Please note that Found is now known as Elastic Cloud. Elasticsearch's Fuzzy query is a powerful tool for a multitude of situations. Username searches, misspellings, and other funky problems can oftentimes be solved with this unconventional query.
POST /fuzzy_products/product/_search { "query": { "match": { "name": { "query": "Vacuummm", "fuzziness": 2, "prefix_length": 1 } } } } The metric used by fuzzy queries to determine a match is the Damerau-Levenshtein distance formula.
Because Elasticsearch is super flexible, it can be fine-tuned to provide the most relevant search results for your specific use case (s). One relatively straightforward way to fine-tune results is by providing additional clauses in the queries that are sent to Elasticsearch.
OK, I ended up with the following based on what suggested here: https://medium.com/@oysterpail/fuzzy-queries-ae47b66b325c#.a4uxw5z0b
Their solution is using a bool
query of should
. I can't do it as I need this part of the query to be must
(I use the should
part for relevancy), and a bool query of must
is actually AND
. However, must
+ or
did the trick:
{
"query":{
"bool":{
"must":{
"or":[
{
"multi_match":{
"query":"rob",
"fields":[
"username",
"firstName",
"lastName"
],
"type":"most_fields",
"fuzziness":"AUTO"
}
},
{
"multi_match":{
"query":"rob",
"fields":[
"username",
"firstName",
"lastName"
],
"type":"most_fields"
}
}
]
}
}
}
}
This way, the results coming from the fuzziness
part, have a match only to the first part of the query, whereas the exact-match results have a match to both parts, therefore they are showing up first.
quite an old question but I'll answer to help others looking at it in the present. Well the reason you are getting 'Ron' before 'Rob' is because of the TF/IDF algorithm. In your dataset the word 'Rob' has more occurrence than 'Ron' so the algorithm will give a lower score to 'Rob'.
If you just want to search for names then you can use a different scoring algorithm or similarity. In your case a 'boolean' similarity should work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With