ElasticSearch: Is it possible to give a lower score for fuzziness?

Tags:

elasticsearch

I'm running a multi_match (with most_fields and "fuzziness": "AUTO") query for "Rob", but I get a result with "Ron" before "Rob".

If I remove the fuzziness, it shows Rob only, not Ron. However, I do want to use the fuzziness, I just expect all results that are exact match to be more relevant and to be shown first. It's not happening. Investigating the 'explain', shows that the IDF of 'Ron' is a bit higher.

Back to my question - is it possible to configure some 'boost' or 'score' to the fuzziness element?

832

asked Feb 09 '16 19:02

David

2 Answers

OK, I ended up with the following based on what suggested here: https://medium.com/@oysterpail/fuzzy-queries-ae47b66b325c#.a4uxw5z0b

Their solution is using a bool query of should. I can't do it as I need this part of the query to be must (I use the should part for relevancy), and a bool query of must is actually AND. However, must + or did the trick:

{
   "query":{
      "bool":{
         "must":{
            "or":[
               {
                  "multi_match":{
                     "query":"rob",
                     "fields":[
                        "username",
                        "firstName",
                        "lastName"
                     ],
                     "type":"most_fields",
                     "fuzziness":"AUTO"
                  }
               },
               {
                  "multi_match":{
                     "query":"rob",
                     "fields":[
                        "username",
                        "firstName",
                        "lastName"
                     ],
                     "type":"most_fields"
                  }
               }
            ]
         }
      }
   }
}

This way, the results coming from the fuzziness part, have a match only to the first part of the query, whereas the exact-match results have a match to both parts, therefore they are showing up first.

answered Oct 04 '22 15:10

David

quite an old question but I'll answer to help others looking at it in the present. Well the reason you are getting 'Ron' before 'Rob' is because of the TF/IDF algorithm. In your dataset the word 'Rob' has more occurrence than 'Ron' so the algorithm will give a lower score to 'Rob'.

If you just want to search for names then you can use a different scoring algorithm or similarity. In your case a 'boolean' similarity should work.

answered Oct 04 '22 15:10

Sourav Patra

Related questions
                            
                                PostgreSQL + Elasticsearch synchronization in JAVA spring (JPA)
                            
                                version_conflict_engine_exception with multiple _update_by_query
                            
                                Content-type header not supported
                            
                                How to get around "connection reset by peer" when using Elasticsearch's RestClient
                            
                                ElasticSearch Delete Query - Filter with term and range
                            
                                ElasticSearch Bool Filter with a Phrase (instead of a single word/tag)
                            
                                elasticsearch client thread safety
                            
                                NEST (elasticsearch) Highlighting in multiple fields
                            
                                How to define a mapping in elasticsearch that doesn't accept fields other that the mapped ones?
                            
                                how to use sincedb in logstash?
                            
                                Unique count of terms aggregations
                            
                                ElasticSearch QueryParsingException failed to find geo_point field
                            
                                Spring Data ElasticSearch TransportClient Java Config
                            
                                Kibana 4 , making pie chart , error message
                            
                                Why do I need a broker for my production ELK stack + machine specs?
                            
                                Searching for term and range in same bool filter in elasticsearch
                            
                                Trouble setting request specific timeout in Elasticsearch DSL
                            
                                Documents are automatically getting deleted in Elasticsearch after insertion
                            
                                ElasticSearch: inner_hits and hightlight_query
                            
                                python elasticsearch-dsl parent child relationship

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

ElasticSearch: Is it possible to give a lower score for fuzziness?

Tags:

elasticsearch

David

People also ask

2 Answers

David

Sourav Patra

Recent Activity

Donate For Us