Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bulk update by query in Elastic Search?

I know that Elastic Search does not currently support bulk updating by query because of Lucene, but are there any alternatives that don't involve installing an ElasticSearch extension?

For example, are there any workarounds to performing:

UPDATE users SET temp = 1 WHERE temp = 0;

Using the bulk method? Or some other method that I don't know about?

I'm new to Elastic Search as an entity so I don't know the ins and outs, but I have read a lot about its ability to update one at a time, but that would be too time consuming with hundreds of thousands of rows.

Just looking for someone to point me in the right direction.

like image 793
Anthony Avatar asked Dec 09 '14 21:12

Anthony


People also ask

What is batch processing in Elasticsearch?

Batch processing in Elasticsearch refers to the process of indexing or deleting large amounts of data at once. This can be done using the Bulk API, which allows you to index or delete multiple documents in a single request.

Can you update a document in Elasticsearch?

DescriptioneditEnables you to script document updates. The script can update, delete, or skip modifying the document. The update API also supports passing a partial document, which is merged into the existing document. To fully replace an existing document, use the index API.

What is bulk API in Elasticsearch?

The bulk API makes it possible to perform many index/delete operations in a single API call. This can greatly increase the indexing speed.


2 Answers

update_by_query was added to elasticsearch in version 2.3.

The update-by-query API is new and should still be considered experimental. The API may change in ways that are not backwards compatible.

https://www.elastic.co/guide/en/elasticsearch/reference/2.3/docs-update-by-query.html

It seems like you need to write a script for the update portion, so it's a bit of a pain.

UPDATE users SET temp = 1 WHERE temp = 0;

==>

{
    "query": {
        "term": {
            "temp": 0
        }
    },
    "script": {
        "inline": "ctx._source.temp = 1"
    }
}

Note: For this inline script version to work, you'll need inline scripts enabled:

script.inline: true
script.indexed: true
script.disable_dynamic: false
like image 194
spazm Avatar answered Sep 30 '22 00:09

spazm


Following up on datashovel answer you should use Elasticsearch scrolling API to fetched the desired documents and then using bulk update (or not) update the documents.

Assuming your index is users and doc_type is user that would be something like:

curl -XGET 'localhost:9200/users/user/_search?scroll=1m' -d '
{
    "constant_score": {
        "filter" : {
           "term" : {
               "temp" : 1
           }
        }
    }
}'

Which will return a scroll_id (something like c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1) which then you should use for iterating over the results, doing:

curl -XGET  'localhost:9200/_search/scroll?scroll=1m' \
    -d 'c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1'

Until there aren't any hits.

While iterating you should create a list for bulk updating, containing all the elements returned by the scrolling.

{ "update" : {"_id" : "1", "_type" : "user", "_index" : "users"} }
{ "doc" : {"temp" : 0} }
{ "update" : {"_id" : "2", "_type" : "user", "_index" : "users"} }
{ "doc" : {"temp" : 0} }
{ "update" : {"_id" : "3", "_type" : "user", "_index" : "users"} }
{ "doc" : {"temp" : 0} }

(You can see more detail on how to do this on the bulk api docs)

I don't know any PHP but the Elasticsearch PHP API Elastica has some helper functions for scrolling and bulk.

like image 27
vierja Avatar answered Sep 30 '22 01:09

vierja