Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch: Influence scoring with custom score field in document

I have a set of words extracted out of text through NLP algos, with associated score for each word in every document.

For example :

document 1: {  "vocab": [ {"wtag":"James Bond", "rscore": 2.14 }, 
                          {"wtag":"world", "rscore": 0.86 }, 
                          ...., 
                          {"wtag":"somemore", "rscore": 3.15 }
                        ] 
            }

document 2: {  "vocab": [ {"wtag":"hiii", "rscore": 1.34 }, 
                          {"wtag":"world", "rscore": 0.94 },
                          ...., 
                          {"wtag":"somemore", "rscore": 3.23 } 
                        ] 
            }

I want rscores of matched wtag in each document to affect the _score given to it by ES, maybe multiplied or added to the _score, to influence the final _score (in turn, order) of the resulting documents. Is there any way to achieve this?

like image 327
Haywire Avatar asked Jan 29 '14 18:01

Haywire


People also ask

How do I change my Elasticsearch score?

You can achieve it simply by removing the boost_mode parameter, the default boost_mode is to multiply the _score with whatever value comes out of the field_value_factor function.

How does Elasticsearch calculate score?

The default scoring algorithm used by Elasticsearch is BM25. There are three main factors that determine a document's score: Term frequency (TF) — The more times that a search term appears in the field we are searching in a document, the more relevant that document is.

How do you get a max score on Elasticsearch?

The basic mechanics are as follows: ElasticSearch Score is normalized between 0..1 ( score/max(score) ), we add our ranking score ( also normalized between 0..1 ) and divide by 2.

What is Elasticsearch relevance score?

Elasticsearch uses search relevance to score documents of a dataset. It returns an ordered list of data sorted by a relevance score. We can customize the score by adding and modifying variables that will shift the scale between precision and recall.


1 Answers

Another way of approaching this would be to use nested documents:

First setup the mapping to make vocab a nested document, meaning that each wtag/rscore document would be indexed internally as a separate document:

curl -XPUT "http://localhost:9200/myindex/" -d'
{
  "settings": {"number_of_shards": 1}, 
  "mappings": {
    "mytype": {
      "properties": {
        "vocab": {
          "type": "nested",
          "fields": {
            "wtag": {
              "type": "string"
            },
            "rscore": {
              "type": "float"
            }
          }
        }
      }
    }
  }
}'

Then index your docs:

curl -XPUT "http://localhost:9200/myindex/mytype/1" -d'
{
  "vocab": [
    {
      "wtag": "James Bond",
      "rscore": 2.14
    },
    {
      "wtag": "world",
      "rscore": 0.86
    },
    {
      "wtag": "somemore",
      "rscore": 3.15
    }
  ]
}'

curl -XPUT "http://localhost:9200/myindex/mytype/2" -d'
{
  "vocab": [
    {
      "wtag": "hiii",
      "rscore": 1.34
    },
    {
      "wtag": "world",
      "rscore": 0.94
    },
    {
      "wtag": "somemore",
      "rscore": 3.23
    }
  ]
}'

And run a nested query to match all the nested documents and add up the values of rscore for each nested document which matches:

curl -XGET "http://localhost:9200/myindex/mytype/_search" -d'
{
  "query": {
    "nested": {
      "path": "vocab",
      "score_mode": "sum",
      "query": {
        "function_score": {
          "query": {
            "match": {
              "vocab.wtag": "james bond world"
            }
          },
          "script_score": {
            "script": "doc[\"rscore\"].value"
          }
        }
      }
    }
  }
}'
like image 54
DrTech Avatar answered Sep 29 '22 04:09

DrTech