Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I boost the field length norm in elasticsearch function score?

I know that elasticsearch takes in account the length of a field when computing the score of the documents retrieved by a query. The shorter the field, the higher the weight (see The field-length norm).

I like this behaviour: when I search for iphone I am much more interested in iphone 6 than in Crappy accessories for: iphone 5 iphone 5s iphone 6.

Now, I would like to try to boost this stuff, let's say that I want to double its importance.

I know that one can modify the score using the function score, and I guess that I can achieve what I want via script score.

I tried to add another field-length norm to the score like this:

    {
     "query": {
       "function_score": {
         "boost_mode": "replace",
         "query": {...},
         "script_score": {
             "script": "_score + norm(doc)"
         }
       }
     }
   }

But I failed badly, getting this error: [No parser for element [function_score]]

EDIT:

My first error was that I hadn't wrapped the function score in a "query". Now I edited the code above. My new error says

GroovyScriptExecutionException[MissingMethodException
[No signature of method: Script5.norm() is applicable for argument types:
(org.elasticsearch.search.lookup.DocLookup) values: 
[<org.elasticsearch.search.lookup.DocLookup@2c935f6f>]
Possible solutions: notify(), wait(), run(), run(), dump(), any()]]

EDIT: I provided a first answer, but I'm hoping for a better one

like image 918
Mario Trucco Avatar asked Aug 17 '15 21:08

Mario Trucco


People also ask

How does boost work in Elasticsearch?

Returns documents matching a positive query while reducing the relevance score of documents that also match a negative query. You can use the boosting query to demote certain documents without excluding them from the search results.

How do I change Elasticsearch score?

According to your comment, you need the _score to be multiplied by the document's score field. You can achieve it simply by removing the boost_mode parameter, the default boost_mode is to multiply the _score with whatever value comes out of the field_value_factor function.

What is Max score in Elasticsearch?

The idea is quite simple: say that you want to collect the top 10 matches, that the maximum score for the term "elasticsearch" is 3.0 and the maximum score for the term "kibana" is 5.0.

How does Elasticsearch calculate score?

Before scoring documents, Elasticsearch first reduces the set of candidate documents by applying a boolean test that only includes documents that match the query. A score is then calculated for each document in this set, and this score determines how the documents are ordered.


2 Answers

It looks like you could achieve that using a field of type token_count together with a field_value_factor function score.

So, something like this in the field mapping:

"name": { 
  "type": "string",
  "fields": {
    "length": { 
      "type":     "token_count",
      "analyzer": "standard"
    }
  }
}

This will use the number of tokens in the field. If you want to use the number of characters, you can change the analyzer from standard to a custom one that tokenizes each character.

Then in the query:

"function_score": {
  ...,
  "field_value_factor": {
    "field": "name.length",
    "modifier": "reciprocal"
  }
}
like image 142
robinst Avatar answered Oct 08 '22 04:10

robinst


I have something that kind of works. With the following, I deduct the length of a field of my interest from the score.

{
 "query": {
   "function_score": {
     "boost_mode": "replace",
     "query": {...},
     "script_score": {
         "script": "_score  - doc['<field_name>'].value.length()"
     }
   }
 }
}

Hovever, I cannot control the relative weight of this number I am subtracting, compared to the old score. That's why I am not accepting my answer: I'll wait for better ones for a while. Ideally, I'd love to have a way to access the field length norm function within the script_score, or to get an equivalent result.

like image 26
Mario Trucco Avatar answered Oct 08 '22 03:10

Mario Trucco