I know that elasticsearch takes in account the length of a field when computing the score of the documents retrieved by a query. The shorter the field, the higher the weight (see The field-length norm).
I like this behaviour: when I search for iphone
I am much more interested in iphone 6
than in Crappy accessories for: iphone 5 iphone 5s iphone 6
.
Now, I would like to try to boost this stuff, let's say that I want to double its importance.
I know that one can modify the score using the function score, and I guess that I can achieve what I want via script score.
I tried to add another field-length norm to the score like this:
{
"query": {
"function_score": {
"boost_mode": "replace",
"query": {...},
"script_score": {
"script": "_score + norm(doc)"
}
}
}
}
But I failed badly, getting this error: [No parser for element [function_score]]
EDIT:
My first error was that I hadn't wrapped the function score in a "query". Now I edited the code above. My new error says
GroovyScriptExecutionException[MissingMethodException
[No signature of method: Script5.norm() is applicable for argument types:
(org.elasticsearch.search.lookup.DocLookup) values:
[<org.elasticsearch.search.lookup.DocLookup@2c935f6f>]
Possible solutions: notify(), wait(), run(), run(), dump(), any()]]
EDIT: I provided a first answer, but I'm hoping for a better one
Returns documents matching a positive query while reducing the relevance score of documents that also match a negative query. You can use the boosting query to demote certain documents without excluding them from the search results.
According to your comment, you need the _score to be multiplied by the document's score field. You can achieve it simply by removing the boost_mode parameter, the default boost_mode is to multiply the _score with whatever value comes out of the field_value_factor function.
The idea is quite simple: say that you want to collect the top 10 matches, that the maximum score for the term "elasticsearch" is 3.0 and the maximum score for the term "kibana" is 5.0.
Before scoring documents, Elasticsearch first reduces the set of candidate documents by applying a boolean test that only includes documents that match the query. A score is then calculated for each document in this set, and this score determines how the documents are ordered.
It looks like you could achieve that using a field of type token_count
together with a field_value_factor
function score.
So, something like this in the field mapping:
"name": {
"type": "string",
"fields": {
"length": {
"type": "token_count",
"analyzer": "standard"
}
}
}
This will use the number of tokens in the field. If you want to use the number of characters, you can change the analyzer from standard
to a custom one that tokenizes each character.
Then in the query:
"function_score": {
...,
"field_value_factor": {
"field": "name.length",
"modifier": "reciprocal"
}
}
I have something that kind of works. With the following, I deduct the length of a field of my interest from the score.
{
"query": {
"function_score": {
"boost_mode": "replace",
"query": {...},
"script_score": {
"script": "_score - doc['<field_name>'].value.length()"
}
}
}
}
Hovever, I cannot control the relative weight of this number I am subtracting, compared to the old score. That's why I am not accepting my answer: I'll wait for better ones for a while. Ideally, I'd love to have a way to access the field length norm function within the script_score
, or to get an equivalent result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With