Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch - How to normalize score when combining regular query and function_score?

Idealy what I am trying to achieve is to assign weights to queries such that query1 constitutes 30% of the final score and query2 consitutes other 70%, so to achieve the maximum score a document has to have highest possible score on query1 and query2. My study of the documentation did not yield any hints as to how to achieve this so lets try to solve a simpler problem.

Consider a query in following form:

{ "query": {     "bool": {         "should": [             {                 "function_score": {                     "query": {"match_all": {}},                     "script_score": {                         "script": "<some_script>",                     }                 }             },             {                 "match": {                     "message": "this is a test"                 }             }         ]     } } } 

The script can return an arbitrary number (think-> it can return something like 12392002).

How do I make sure that the result from the script will not dominate the overall score?

Is there any way to normalize it? For example instead of script score return the ratio to max_script_score (achieved by document with highest score)?

like image 974
JohnnyM Avatar asked Aug 18 '14 11:08

JohnnyM


1 Answers

Recently i am working on a problem like this too. I couldn't find any formal documentation about this issue but when i investigate the results with "explain api", it seems like "queryNorm" is not applied to the score directly coming from "functions" field. This means that you can not directly normalize script value.

However, i think i find a little bit tricky solution to this problem. If you combine this function field with a query like you do (match_all query) and give a boost to that query, normalization is working on this query that is, multiplication of this two scores - from normalized query and from script- will give us a total normalization. For a better explanation query will be like:

{ "query": {     "bool": {         "should": [             {                 "function_score": {                     "query": {"match_all": {"boost":1}},                     "functions": [ {                     "script_score": {                         "script": "<some_script>",                     }}],                     "score_mode": "sum",                     "boost_mode": "multiply"                 }             },             {                 "match": {                     "message": "this is a test"                 }             }         ]     } } } 

This answer is not a proper solution to your problem but i think you can play with this query to obtain required result. My suggestion to you is use explain api, try to understand what it is returned, examine the parameters affecting final score and play with script and boost values to get optimized solution.

Btw, "rescore query" may help a lot to obtain that %30-%70 ratio on the final score: Official documentation

like image 67
Heval Avatar answered Oct 14 '22 16:10

Heval