Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is better scrolls or search_after in elasticsearch to simulate random pagination?

I want to randomly jump to a page of results from elasticsearch. There are three ways to paginate in elasticsearch:

  • from/size - I can't use this because of the maximum depth limit of 10000.
  • scroll API - I can use this but it has a cost of memory usage (keeping the search context alive) associated with it.
  • search_after - I can also use this even it is less expensive than scrolls as it is stateless.

I know that anyway, Elasticsearch will sequentially read the data. Let's say if I wanted to get 99th page then elastic is going to read all 98 results to get the 99th result.

I can do one thing i.e. to reduce the data which I will sequentially get before the targeted data, in this case I will reduce the data returned for 98 pages and for the 99th one I will get the complete data.

My main question is "What if I don't have memory concerns then which approach would be faster to sequentially get 98 pages ?" (search_after or scrolls)

If I use scrolls I will be clearing it after every usage.

like image 406
TechnocratSid Avatar asked Jun 22 '18 07:06

TechnocratSid


People also ask

Does Elasticsearch support pagination?

Elasticsearch provides three ways of paginating data that are each useful: From/Size Pagination. Search After Pagination. Scroll Pagination.

How pagination works in Elasticsearch?

If a search request results in more than ten hits, ElasticSearch will, by default, only return the first ten hits. To override that default value in order to retrieve more or fewer hits, we can add a size parameter to the search request body.

What is scroll in elastic search?

The scroll parameter indicates how long Elasticsearch should retain the search context for the request. The search response returns a scroll ID in the _scroll_id response body parameter. You can then use the scroll ID with the scroll API to retrieve the next batch of results for the request.

What is offset in Elasticsearch?

For inputs that use the multiline codec, a field is created called “offset” and it is stored in elasticsearch. What is this field represent or is used for? I am guessing it is the location of the first character in the log entry (or the last) in the log file that was parsed.


1 Answers

If you don't have memory concerns, then the simplest option is to increase the index setting index.max_result_window from 10000 to the number you require.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#dynamic-index-settings

like image 197
Adam T Avatar answered Oct 15 '22 00:10

Adam T