Need to find a way in ElasticSearch to boost the relevance of a document based on a particular value of a field. Specifically, there is a special field in all my documents where the higher the field value is, the more relevant the doc that contains it should be, regardless of the search.
Consider the following document structure:
{ "_all" : {"enabled" : "true"}, "properties" : { "_id": {"type" : "string", "store" : "yes", "index" : "not_analyzed"}, "first_name": {"type" : "string", "store" : "yes", "index" : "yes"}, "last_name": {"type" : "string", "store" : "yes", "index" : "yes"}, "boosting_field": {"type" : "integer", "store" : "yes", "index" : "yes"} } }
I'd like documents with a higher boosting_field value to be inherently more relevant than those with a lower boosting_field value. This is just a starting point -- the matching between the query and the other fields will also be taken into account in determining the final relevance score of each doc in the search. But, all else being equal, the higher the boosting field, the more relevant the document.
Anyone have an idea on how to do this?
Thanks a lot!
Elasticsearch uses search relevance to score documents of a dataset. It returns an ordered list of data sorted by a relevance score. We can customize the score by adding and modifying variables that will shift the scale between precision and recall.
Minimum Should Match is another search technique that allows you to conduct a more controlled search on related or co-occurring topics by specifying the number of search terms or phrases in the query that should occur within the records returned.
You can either boost at index time or query time. I usually prefer query time boosting even though it makes queries a little bit slower, otherwise I'd need to reindex every time I want to change my boosting factors, which usally need fine-tuning and need to be pretty flexible.
There are different ways to apply query time boosting using the elasticsearch query DSL:
The first three queries are useful if you want to give a specific boost to the documents which match specific queries or filters. For example, if you want to boost only the documents published during the last month. You could use this approach with your boosting_field but you'd need to manually define some boosting_field intervals and give them a different boost, which isn't that great.
The best solution would be to use a Custom Score Query, which allows you to make a query and customize its score using a script. It's quite powerful, with the script you can directly modify the score itself. First of all I'd scale the boosting_field values to a value from 0 to 1 for example, so that your final score doesn't become a big number. In order to do that you need to predict what are more or less the minimum and the maximum values that the field can contain. Let's say minimum 0 and maximum 100000 for instance. If you scale the boosting_field value to a number between 0 and 1, then you can add the result to the actual score like this:
{ "query" : { "custom_score" : { "query" : { "match_all" : {} }, "script" : "_score + (1 * doc.boosting_field.doubleValue / 100000)" } } }
You can also consider to use the boosting_field as a boost factor (_score *
rather than _score +
), but then you'd need to scale it to an interval with minimum value 1 (just add a +1).
You can even tune the result in order the change its importance adding a weight to the value that you use to influence the score. You are going to need this even more if you need to combine multiple boosting factors together in order to give them a different weight.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With