Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I retrieve more than 10000 results/events in Elastic-search

Example query:

GET hostname:port /myIndex/_search {      "size": 10000,     "query": {         "term": { "field": "myField" }     } } 

I have been using the size option knowing that:

index.max_result_window = 100000

But if my query has the size of 650,000 Documents for example or even more, how can I retrieve all of the results in one GET?

I have been reading about the SCROLL, FROM-TO, and the PAGINATION API, but all of them never deliver more than 10K.

This is the example from Elasticsearch Forum, that I have been using:

GET /_search?scroll=1m 

Can anybody provide an example where you can retrieve all the documents for a GET search query?

like image 658
Franco Avatar asked Jan 14 '17 22:01

Franco


People also ask

Why does Elasticsearch not return all results?

The reason might be that you haven't provided the size parameter in the query. This limits the result count to 10 by default. Out of all the results the top 10 might be from the two index even thought the match is present in third index as well.

How do I get all records in Elasticsearch?

Introduction. You can use cURL in a UNIX terminal or Windows command prompt, the Kibana Console UI, or any one of the various low-level clients available to make an API call to get all of the documents in an Elasticsearch index. All of these methods use a variation of the GET request to search the index.

How do I retrieve data from Elasticsearch?

You can use the search API to search and aggregate data stored in Elasticsearch data streams or indices. The API's query request body parameter accepts queries written in Query DSL. The following request searches my-index-000001 using a match query. This query matches documents with a user.id value of kimchy .


2 Answers

Scroll is the way to go if you want to retrieve a high number of documents, high in the sense that it's way over the 10000 default limit, which can be raised.

The first request needs to specify the query you want to make and the scroll parameter with duration before the search context times out (1 minute in the example below)

POST /index/type/_search?scroll=1m {     "size": 1000,     "query": {         "match" : {             "title" : "elasticsearch"         }     } } 

In the response to that first call, you get a _scroll_id that you need to use to make the second call:

POST /_search/scroll  {     "scroll" : "1m",      "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="  } 

In each subsequent response, you'll get a new _scroll_id that you need to use for the next call until you've retrieved the amount of documents you need.

So in pseudo code it looks somewhat like this:

# first request response = request('POST /index/type/_search?scroll=1m') docs = [ response.hits ] scroll_id = response._scroll_id  # subsequent requests while (true) {    response = request('POST /_search/scroll', scroll_id)    docs.push(response.hits)    scroll_id = response._scroll_id } 

UPDATE:

Please refer to the following answer which is more accurate regarding the best solution for deep pagination: Elastic Search - Scroll behavior

like image 133
Val Avatar answered Sep 20 '22 07:09

Val


Note that from + size can not be more than the index.max_result_window index setting which defaults to 10,000.

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-from-size.html

So You'll have TWO approches here:

1.add the your query the "track_total_hits": true variable.

GET index/_search {     "size":1,     "track_total_hits": true }

2.Use the Scroll API, but then you can't do the from,size in the ordinary way and you'll have to use the Scroll API.

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-scroll.html

for example:

 POST /twitter/_search?scroll=1m { "size": 100, "query": {     "match" : {         "title" : "elasticsearch"     } } }
like image 31
Eran Peled Avatar answered Sep 23 '22 07:09

Eran Peled