I know that Elastic Search does not currently support bulk updating by query because of Lucene, but are there any alternatives that don't involve installing an ElasticSearch extension?
For example, are there any workarounds to performing:
UPDATE users SET temp = 1 WHERE temp = 0;
Using the bulk method? Or some other method that I don't know about?
I'm new to Elastic Search as an entity so I don't know the ins and outs, but I have read a lot about its ability to update one at a time, but that would be too time consuming with hundreds of thousands of rows.
Just looking for someone to point me in the right direction.
Batch processing in Elasticsearch refers to the process of indexing or deleting large amounts of data at once. This can be done using the Bulk API, which allows you to index or delete multiple documents in a single request.
DescriptioneditEnables you to script document updates. The script can update, delete, or skip modifying the document. The update API also supports passing a partial document, which is merged into the existing document. To fully replace an existing document, use the index API.
The bulk API makes it possible to perform many index/delete operations in a single API call. This can greatly increase the indexing speed.
update_by_query
was added to elasticsearch in version 2.3.
The update-by-query API is new and should still be considered experimental. The API may change in ways that are not backwards compatible.
https://www.elastic.co/guide/en/elasticsearch/reference/2.3/docs-update-by-query.html
It seems like you need to write a script for the update portion, so it's a bit of a pain.
UPDATE users SET temp = 1 WHERE temp = 0;
==>
{
"query": {
"term": {
"temp": 0
}
},
"script": {
"inline": "ctx._source.temp = 1"
}
}
Note: For this inline script version to work, you'll need inline scripts enabled:
script.inline: true
script.indexed: true
script.disable_dynamic: false
Following up on datashovel answer you should use Elasticsearch scrolling API to fetched the desired documents and then using bulk update (or not) update the documents.
Assuming your index is users
and doc_type is user
that would be something like:
curl -XGET 'localhost:9200/users/user/_search?scroll=1m' -d '
{
"constant_score": {
"filter" : {
"term" : {
"temp" : 1
}
}
}
}'
Which will return a scroll_id (something like c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1
) which then you should use for iterating over the results, doing:
curl -XGET 'localhost:9200/_search/scroll?scroll=1m' \
-d 'c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1'
Until there aren't any hits.
While iterating you should create a list for bulk updating, containing all the elements returned by the scrolling.
{ "update" : {"_id" : "1", "_type" : "user", "_index" : "users"} }
{ "doc" : {"temp" : 0} }
{ "update" : {"_id" : "2", "_type" : "user", "_index" : "users"} }
{ "doc" : {"temp" : 0} }
{ "update" : {"_id" : "3", "_type" : "user", "_index" : "users"} }
{ "doc" : {"temp" : 0} }
(You can see more detail on how to do this on the bulk api docs)
I don't know any PHP but the Elasticsearch PHP API Elastica has some helper functions for scrolling and bulk.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With