I am little bit confused over Elasticsearch by its scroll functionality. In elasticsearch is it possible to call search API everytime whenever the user scrolls on the result set? From documentation <pre class="prettyprint"><code>"search_type" => "scan", // use search_type=scan "scroll" => "30s", // how long between scroll requests. should be small! "size" => 50, // how many results *per shard* you want back </code></pre> Is that mean it will perform search for every 30 seconds and returns all the sets of results until there is no records? For example my ES returns total 500 records. I am getting an data from ES as two sets of records each with 250 records. Is there any way I can display first set of 250 records first, when user scrolls then second set of 250 records.Please suggest

What you are looking for is pagination. You can achieve your objective by querying for a fixed size and setting the <code>from</code> parameter. Since you want to set display in batches of 250 results, you can set <code>size = 250</code> and with each consecutive query, increment the value of <code>from</code> by <code>250</code>. <pre class="prettyprint"><code>GET /_search?size=250 ---- return first 250 results GET /_search?size=250&from=250 ---- next 250 results GET /_search?size=250&from=500 ---- next 250 results </code></pre> On the contrary, <code>Scan & scroll</code> lets you retrieve a large set of results with a single search and is ideally meant for operations like re-indexing data into a new index. Using it for displaying search results in real-time is not recommended. To explain <code>Scan & scroll</code> briefly, what it essentially does is that it scans the index for the query provided with the scan request and returns a <code>scroll_id</code>. This <code>scroll_id</code> can be passed to the next scroll request to return the next batch of results. Consider the following example- <pre class="prettyprint lang-py prettyprint-override"><code> # Initialize the scroll page = es.search( index = 'yourIndex', doc_type = 'yourType', scroll = '2m', search_type = 'scan', size = 1000, body = { # Your query's body } ) sid = page['_scroll_id'] scroll_size = page['hits']['total'] # Start scrolling while (scroll_size > 0): print "Scrolling..." page = es.scroll(scroll_id = sid, scroll = '2m') # Update the scroll ID sid = page['_scroll_id'] # Get the number of results that we returned in the last scroll scroll_size = len(page['hits']['hits']) print "scroll size: " + str(scroll_size) # Do something with the obtained page </code></pre> In above example, following events happen- <ul> <li>Scroller is initialized. This returns the first batch of results along with the scroll_id</li> <li>For each subsequent scroll request, the updated <code>scroll_id</code> (received in the previous scroll request) is sent and next batch of results is returned.</li> <li>Scroll time is basically the time for which the search context is kept alive. If the next scroll request is not sent within the set timeframe, the search context is lost and results will not be returned. This is why it should not be used for real-time results display for indexes with a huge number of docs.</li> </ul>

You are understanding wrong the purpose of the <code>scroll</code> property. It does not mean that elasticsearch will fetch next page data after 30 seconds. When you are doing first scroll request you need to specify when scroll context should be closed. <code>scroll</code> parameter is telling to close scroll context after 30 seconds. After doing first scroll request you will get back <code>scroll_id</code>parameter in response. For next pages you need to pass that value to get next page of the scroll response. If you will not do the next scroll request within 30 seconds, the scroll request will be closed and you will not be able to get next pages for that scroll request.

Elasticsearch Scroll

Tags:

scroll

pagination

elasticsearch

I am little bit confused over Elasticsearch by its scroll functionality. In elasticsearch is it possible to call search API everytime whenever the user scrolls on the result set? From documentation

"search_type" => "scan",    // use search_type=scan
"scroll" => "30s",          // how long between scroll requests. should be small!
"size" => 50,               // how many results *per shard* you want back

Is that mean it will perform search for every 30 seconds and returns all the sets of results until there is no records?

For example my ES returns total 500 records. I am getting an data from ES as two sets of records each with 250 records. Is there any way I can display first set of 250 records first, when user scrolls then second set of 250 records.Please suggest

394

asked Oct 06 '17 10:10

Spring

2 Answers

What you are looking for is pagination.

You can achieve your objective by querying for a fixed size and setting the from parameter. Since you want to set display in batches of 250 results, you can set size = 250 and with each consecutive query, increment the value of from by 250.

GET /_search?size=250                     ---- return first 250 results
GET /_search?size=250&from=250            ---- next 250 results 
GET /_search?size=250&from=500            ---- next 250 results

On the contrary, Scan & scroll lets you retrieve a large set of results with a single search and is ideally meant for operations like re-indexing data into a new index. Using it for displaying search results in real-time is not recommended.

To explain Scan & scroll briefly, what it essentially does is that it scans the index for the query provided with the scan request and returns a scroll_id. This scroll_id can be passed to the next scroll request to return the next batch of results.

Consider the following example-

    # Initialize the scroll
    page = es.search(
      index = 'yourIndex',
      doc_type = 'yourType',
      scroll = '2m',
      search_type = 'scan',
      size = 1000,
      body = {
        # Your query's body
        }
    )
    sid = page['_scroll_id']
    scroll_size = page['hits']['total']
      
    # Start scrolling
    while (scroll_size > 0):
      print "Scrolling..."
      page = es.scroll(scroll_id = sid, scroll = '2m')
      # Update the scroll ID
      sid = page['_scroll_id']
      # Get the number of results that we returned in the last scroll
      scroll_size = len(page['hits']['hits'])
      print "scroll size: " + str(scroll_size)
      # Do something with the obtained page

In above example, following events happen-

Scroller is initialized. This returns the first batch of results along with the scroll_id
For each subsequent scroll request, the updated scroll_id (received in the previous scroll request) is sent and next batch of results is returned.
Scroll time is basically the time for which the search context is kept alive. If the next scroll request is not sent within the set timeframe, the search context is lost and results will not be returned. This is why it should not be used for real-time results display for indexes with a huge number of docs.

175

answered Oct 11 '22 08:10

Mayur Buragohain

You are understanding wrong the purpose of the scroll property. It does not mean that elasticsearch will fetch next page data after 30 seconds. When you are doing first scroll request you need to specify when scroll context should be closed. scroll parameter is telling to close scroll context after 30 seconds.

After doing first scroll request you will get back scroll_idparameter in response. For next pages you need to pass that value to get next page of the scroll response. If you will not do the next scroll request within 30 seconds, the scroll request will be closed and you will not be able to get next pages for that scroll request.

answered Oct 11 '22 08:10

Ruben Vardanyan

Related questions
                            
                                How can I run script automatically after Docker container startup
                            
                                Primary shard is not active or isn't assigned is a known node ?
                            
                                ElasticSearch Pagination & Sorting
                            
                                Elastic Search wildcard search with spaces
                            
                                ElasticSearch: How to query a date field using an hours-range filter
                            
                                Kibana: pie chart slices based on substring of a field
                            
                                Elastic search document count
                            
                                Elasticsearch gives different scores for same documents
                            
                                How to get duplicate field values in elastic search by field name without knowing its value
                            
                                "[circuit_breaking_exception] [parent]" Data too large, data for "[<http_request>]" would be error
                            
                                Query with multiple values on a property with one value in Elasticsearch
                            
                                MySQL "not in" Query in Elasticsearch
                            
                                Delete multiple indices in one Elasticsearch HTTP request (cURL)
                            
                                Validation Failed: 1: no requests added in bulk indexing ElasticSearch
                            
                                Updating indexed document in Elasticsearch
                            
                                Getting elasticsearch "can not run as root" error after upgrading from SonarQube 6.5 to 6.6. Nothing else changed
                            
                                Representing a Kibana query in a REST, curl form
                            
                                Install elasticsearch 1.1 using brew
                            
                                ElasticSearch: How to search for a value in any field, across all types, in one or more indices?
                            
                                ElasticSearch date range

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With