My friend stored 65000 documents on the Elastic Search cloud and I would like to retrieve all of them (using python). However, when I am running my current script, there is an error noticing that :
RequestError(400, 'search_phase_execution_exception', 'Result window is too large, from + size must be less than or equal to: [10000] but was [30000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.')
My script
es = Elasticsearch(cloud_id=cloud_id, http_auth=(username, password))
docs = es.search(body={"query": {"match_all": {}}, '_source': ["_id"], 'size': 65000})
What would be the easiest way to retrieve all those document and not limit it to 10000 docs? thanks
The limit has been set so that the result set does not overwhelm your nodes. Results will occupy memory in the elastic node. So bigger the result set, bigger the memory footprint and impact on the nodes.
Depending on what you want to do with the retrieved documents,
try to use the scroll api (as suggested in your error message) if its a batch job. Be mindful of the lifetime of scroll context in that case.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-scroll
or, use the Search After
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-search-after
You should use the scroll API and get the results in different calls. The scroll API will return to you the results 10000 by 10000 as maximum (that will be available to consult during the amount of time you indicate in the call) and you will be able then to paginate the results and obtain them thanks to a scroll_id.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With