I'm trying to scroll my ES index and grab all the documents but it looks like I keep missing the first set of documents returned by the initial scroll. For example if my scroll size is 10 and my query returns a total of 100 after scrolling I would only have 90 documents. Any suggestions on what I'm missing?
Here's what I've currently tried:
$json = '{"query":{"bool":{"must":[{"match_all":{}}]}}}';
$params = [
"scroll" => "1m",
"size" => 50,
"index" => "myindex",
"type" => "mytype",
"body" => $json
];
$results = $client->search($params);
$scroll_size = $results['hits']['total']; // returns total docs that match query
$s_id = $results['_scroll_id'];
print " total results: " . $scroll_size;
//scroll
$count = 0;
while ($scroll_size > 0) {
print " SCROLLING...";
$scroll_results = $client->scroll([
'scroll_id' => $s_id,
'scroll' => '1m'
]);
// get number of results returned in the last scroll
$scroll_size = sizeof($scroll_results['hits']['hits']);
print " scroll size: " . $scroll_size;
// do something with results
for ($i=0; $i<$scroll_size; $i++) {
$count++;
}
}
print " total id count: " . $id_count;
the first query you execute to return number of documents, also returns documents. The first query is to establish the scroll and also to get the first set of documents. Once you process the first set of results, you can use the scroll_id to get the next page and so on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With