Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Paging in Elasticsearch when results have equal scores

Is it possible to implement reliable paging of elasticsearch search results if multiple documents have equal scores?

I'm experimenting with custom scoring in elasticsearch. Many of the scoring expressions I try yield result sets where many documents have equal scores. They seem to come in the same order each time I try, but can it be guaranteed?

AFAIU it can't, especially not if there is more than one shard in a cluster. Documents with equal score wrt. a given elasticsearch query are returned in random, non-deterministic order that can change between invocations of the same query, even if the underlying database does not change (and therefore paging is unreliable) unless one of the following holds:

  1. I use function_score to guarantee that the score is unique for each document (e.g. by using a unique number field).
  2. I use sort and guarantee that the sorting defines a total order (e.g. by using a unique field as fallback if everything else is equal).

Can anyone confirm (and maybe point at some reference)?

Does this change if I know that there is only one primary shard without any replicas (see other, similar querstion: Inconsistent ordering of results across primary /replica for documents with equivalent score) ? E.g. if I guarantee that there is one shard AND there is no change in the database between two invocations of the same query then that query will return results in the same order?

What are other alternatives (if any)?

like image 899
Antoni Myłka Avatar asked Nov 24 '14 12:11

Antoni Myłka


People also ask

How does pagination work in Elasticsearch?

The simplest method of pagination uses the from and size parameters available in Elasticsearch's search API. By default, from is 0 and size is 10, meaning if you don't specify otherwise, Elasticsearch will return only the first ten results from your index.

How do I get more than 10000 results Elasticsearch?

By default, you cannot use from and size to page through more than 10,000 hits. This limit is a safeguard set by the index. max_result_window index setting. If you need to page through more than 10,000 hits, use the search_after parameter instead.

What is Elasticsearch relevance score?

Elasticsearch uses search relevance to score documents of a dataset. It returns an ordered list of data sorted by a relevance score. We can customize the score by adding and modifying variables that will shift the scale between precision and recall.


1 Answers

I ended up using additional sort in cases where equal scores are likely to happen - for example searching by product category. This additional sort could be id, creation date or similar. The setup is 2 servers, 3 shards and 1 replica.

like image 151
Petteri H Avatar answered Oct 04 '22 18:10

Petteri H