Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does Elasticsearch automatic slicing do?

What does Elasticsearch automatic slicing do? I find the documentation to be very laconic about this function. I tried searching for other explanations of this functionality, but to no avail. Neither I have managed to find what slice is in Elasticsearch.

like image 627
Jindřich Mynarz Avatar asked Apr 04 '17 15:04

Jindřich Mynarz


People also ask

How do I get more than 10000 hits in Elasticsearch?

By default, you cannot use from and size to page through more than 10,000 hits. This limit is a safeguard set by the index. max_result_window index setting. If you need to page through more than 10,000 hits, use the search_after parameter instead.

How does Elasticsearch implement pagination?

The Simplest to Implement The simplest method of pagination uses the from and size parameters available in Elasticsearch's search API. By default, from is 0 and size is 10, meaning if you don't specify otherwise, Elasticsearch will return only the first ten results from your index.


1 Answers

Automatic slicing is a way to parallelize work for a few different endpoints, such as reindex, update by query and delete by query.

The three above APIs all work the same way by making a scroll query over the target index. Scroll queries provide a more performant way of making queries yielding big result sets than normal paged queries. Scroll queries can be further improved by slicing them.

In clear, if a query is supposed to return a big amount of hits, you can make a normal query and page through results using from/size, but that will not be performant because of deep-paging. To circumvent that issue, ES allows you to use scroll queries in order to get results in batches of N hits. Those scroll queries can further be improved by slicing them, i.e. split the scroll in multiple slices which can be consumed independently by your client application.

So, say you have a query which is supposed to return 1,000,000 hits, and you want to scroll over that result set in batches of 50,000 hits, using a normal scroll query (i.e. without slicing), your client application will have to make the first scroll call and then 20 more synchronous calls (i.e. one after another) to retrieve each batch of 50K hits.

By using slicing, you can parallelize the 20 scroll calls. If your client application is multi-threaded, you can make each scroll call use 5 (e.g.) slices, and thus, you'll end up with 5 slices of ~10K hits that can be consumed by 5 different threads in your application, instead of having a single thread consume 50K hits. You can thus leverage the full computing power of your client application to consume those hits.

The ideal number of slices should be a multiple of the number of shards in the source index. For the best performance, you should pick the same number of slices as there are shards in your source index. For that reason, you might want to use automatic slicing instead of manual slicing, as ES will pick that number for you.

like image 143
Val Avatar answered Sep 23 '22 06:09

Val