What is the best way to do a pagination using Elasticsearch? Currently, I am working on an API that uses Elasticsearch in the backend with Python, and my index does not have much data, so by default we are doing the pagination in the frontend using JavaScript (and so far, we have do not have any problems).
I want to know for bigger indexes, what is the best way to handle pagination:
search_after
The default way of paginating over search results in Elasticsearch is using from
/size
parameters. This will, however, work only for the top 10k search results.
In case you need to go above that the way to go is search_after
.
In case you need to dump the entire index, and it contains more than 10k documents, use scroll
API.
All of these queries allow to retrieve portions of search results, but they have major differences.
from/size is the cheapest and fastest, it is what Google would use to go for the second, third, etc. search results pages if it used Elasticsearch.
Scroll API is expensive, because it creates a kind of snapshot of the index the moment you create the first query, to make sure by the end of the scroll you will have exactly the data that was present in the index at the start. Doing a scroll request will cost resources, and running many of them in parallel can kill your performance, so proceed with caution.
Search after instead is a half-way between the two:
search_after
is not a solution to jump freely to a random page but rather to scroll many queries in parallel. It is very similar to thescroll
API but unlike it, thesearch_after
parameter is stateless, it is always resolved against the latest version of the searcher. For this reason the sort order may change during a walk depending on the updates and deletes of your index.
So it will allow you to paginate above 10k, with a cost of some possible inconsistency.
index.max_result_window
is set to 10k as a hard limit to avoid out of memory situations:
index.max_result_window
The maximum value of
from
+size
for searches to this index. Defaults to 10000. Search requests take heap memory and time proportional tofrom
+size
and this limits that memory.
Sliced scroll is just a faster way of doing a normal scroll: it allows to download the collection of documents in parallel. Slice is just a subset of documents in the scroll query output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With