I'm new to ES and confused by its documentation of scroll. From the docs "Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of of one index into a new index with a different configuration".
And yet...further down on the same page it says not to use from() and size() to do pagination because it "is very inefficient". And on the Java API page describing Search it shows an example of paging via Scroll.
So, assuming I want to present sorted search results, a page at a time, which approach is recommended: from/size or Scrolling?
from/size
is very inefficient when you want to do deep pagination or if you want to request lots of results by page.
The reason is that results are sorted first on each shard, and all those results are then gathered, merged and sorted by the request coordinator node. This become more and more costly as the pages grow either in size or in rank. You will find a very good example documented here.
You could limit the size of your users' queries (e.g. to something like ~1000 results), and you will be fine using from/size
.
If it's not an option, you can still use scroll, but you will lose some features like aggregations and keeping the search context alive has a cost.
You can use search_after
. The basic process flow will be like this:
search_after
field in the body to tell Elasticsearch to only return documents after the specified document (date).This way, your results remain robust against any updates or document deletions and stay accurate. You also avoid the scrolling costs (as you've likely already read) and the from/size method's linear time operation cost for each query starting from your initial document result.
See the docs for more info and implementation details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With