elasticsearch scoring on multiple indexes

Question

i have an index for any quarter of a year ("index-2015.1","index-2015.2"... )

i have around 30 million documents on each index.

a document has a text field ('title')

my document sorting method is (1)_score (2)created date

the problem is:

when searching for some text on on 'title' field for all indexes ("index-201*"), always the first results is from one index.

lets say if i am searching for 'title=home' and i have 10k documents on "index-2015.1" with title=home and 10k documents on "index-2015.2" with title=home then the first results are all documents from "index-2015.1" (and not from "index-2015.2", or mixed) even that on "index-2015.2" there are documents with "created date" higher then in "index-2015.1".

is there a reason for this?

Slomo · Accepted Answer

The reason is probably, that the scores are specific to the index. So if you really have multiple indices, the result score of the documents will be calculated (slightly) different for each index.

Simply put, among other things, the score of a matching document is dependent on the query terms and their occurrences in the index. The score is calculated in regard to the index (actually, by default even to each separate shard). There are some normalizations elasticsearch does, but I don't know the details of those.

I'm not really able to explain it well, but here's the article about scoring. I think you want to read at least the part about TF/IDF. Which I think, should explain why you get different scores.

https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html

EDIT:

So, after testing it a bit on my machine, it seems possible to use another search_type, to achieve a score suitable for your case.

POST /index1,index2/_search?search_type=dfs_query_then_fetch
{
    "query" : {
       "match": {
          "title": "home"
       }
    }
}

The important part is search_type=dfs_query_then_fetch. If you are programming java or something similar, there should be a way to specify it in the request. For details about the search_types, refer to the documentation.

Basically it will first collect the term-frequencies on all affected shards (+ indexes). Therefore the score should be generalized over all these.

Eyal Ch · Answer

according to Andrei Stefan and Slomo, index boosting solve my problem:

   body={ 
       "indices_boost" : { "index-2015.4" : 1.4, "index-2015.3" : 1.3,"index-2015.2" : 1.2 ,"index-2015.1" : 1.1 }
        }

EDIT:

using search_type=dfs_query_then_fetch (as Slomo described) will solve the problem in better way (depend what is your business model...)

elasticsearch scoring on multiple indexes

Tags:

elasticsearch

Eyal Ch

2 Answers

Slomo

Eyal Ch

Recent Activity

Donate For Us

elasticsearch scoring on multiple indexes

Tags:

elasticsearch

Eyal Ch

2 Answers

Slomo

Eyal Ch

Related questions

Recent Activity

Donate For Us