I have a set of words extracted out of text through NLP algos, with associated score for each word in every document.
For example :
document 1: { "vocab": [ {"wtag":"James Bond", "rscore": 2.14 },
{"wtag":"world", "rscore": 0.86 },
....,
{"wtag":"somemore", "rscore": 3.15 }
]
}
document 2: { "vocab": [ {"wtag":"hiii", "rscore": 1.34 },
{"wtag":"world", "rscore": 0.94 },
....,
{"wtag":"somemore", "rscore": 3.23 }
]
}
I want rscore
s of matched wtag
in each document to affect the _score
given to it by ES, maybe multiplied or added to the _score
, to influence the final _score
(in turn, order) of the resulting documents. Is there any way to achieve this?
You can achieve it simply by removing the boost_mode parameter, the default boost_mode is to multiply the _score with whatever value comes out of the field_value_factor function.
The default scoring algorithm used by Elasticsearch is BM25. There are three main factors that determine a document's score: Term frequency (TF) — The more times that a search term appears in the field we are searching in a document, the more relevant that document is.
The basic mechanics are as follows: ElasticSearch Score is normalized between 0..1 ( score/max(score) ), we add our ranking score ( also normalized between 0..1 ) and divide by 2.
Elasticsearch uses search relevance to score documents of a dataset. It returns an ordered list of data sorted by a relevance score. We can customize the score by adding and modifying variables that will shift the scale between precision and recall.
Another way of approaching this would be to use nested documents:
First setup the mapping to make vocab
a nested document, meaning that each wtag
/rscore
document would be indexed internally as a separate document:
curl -XPUT "http://localhost:9200/myindex/" -d'
{
"settings": {"number_of_shards": 1},
"mappings": {
"mytype": {
"properties": {
"vocab": {
"type": "nested",
"fields": {
"wtag": {
"type": "string"
},
"rscore": {
"type": "float"
}
}
}
}
}
}
}'
Then index your docs:
curl -XPUT "http://localhost:9200/myindex/mytype/1" -d'
{
"vocab": [
{
"wtag": "James Bond",
"rscore": 2.14
},
{
"wtag": "world",
"rscore": 0.86
},
{
"wtag": "somemore",
"rscore": 3.15
}
]
}'
curl -XPUT "http://localhost:9200/myindex/mytype/2" -d'
{
"vocab": [
{
"wtag": "hiii",
"rscore": 1.34
},
{
"wtag": "world",
"rscore": 0.94
},
{
"wtag": "somemore",
"rscore": 3.23
}
]
}'
And run a nested
query to match all the nested documents and add up the values of rscore
for each nested document which matches:
curl -XGET "http://localhost:9200/myindex/mytype/_search" -d'
{
"query": {
"nested": {
"path": "vocab",
"score_mode": "sum",
"query": {
"function_score": {
"query": {
"match": {
"vocab.wtag": "james bond world"
}
},
"script_score": {
"script": "doc[\"rscore\"].value"
}
}
}
}
}
}'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With