Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a script to conditionally update a document in Elasticsearch

I have a use case in which concurrent update requests make hit my Elasticsearch cluster. In order to make sure that a stale event (one that is made irrelevant by a newer request) does not update a document after a newer event has already reached the cluster, I would like to pass a script with my update requests to compare a field to determine if the incoming request is relevant or not. The request would look like this:

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '
{
  "script": " IF ctx._source.user_update_time > my_new_time THEN do not update ELSE proceed with update",
  "params": {
    "my_new_time": "2014-09-01T17:36:17.517""
   },
  "doc": {
    "name": "new_name"
   },
  "doc_as_upsert": true
}'

Is the pseudo code I wrote in the "script" field possible in Elasticsearch ? If so, I would love some help with the syntax (groovy, python or javascript).

Any alternative approach suggestions would be greatly appreciated too.

like image 937
bkahler Avatar asked Jul 23 '15 22:07

bkahler


2 Answers

Elasticsearch has built-in optimistic concurrency control (+ here and here).

The way it works is that the Update API allows you two use the version parameter in order to control whether the update should proceed or not.

So taking your above example, the first index/update operation would create a document with version: 1. Then take the case where you have two concurrent requests. Both components A and B will send an updated document, they initially have both retrieved the document with version: 1 and will specify that version in their request (see version=1 in the query string below). Elasticsearch will update the document if and only if the provided version is the same as the current one

Component A and B both send this, but A's request is the first to make it:

curl -XPOST 'localhost:9200/test/type1/1/_update?version=1' -d '{
  "doc": {
    "name": "new_name"
   },
  "doc_as_upsert": true
}'

At this point the version of the document will be 2 and B's request will end up with HTTP 409 Conflict, because B assumed the document was still at version 1, even though the version increased in the meantime due to A's request.

B can definitely retrieve the document with the new version (i.e. 2) and try its update again, but this time with ?version=2in the URL. If it's the first one to reach ES, the update will succeed.

like image 175
Val Avatar answered Oct 15 '22 02:10

Val


I think the script should be like this:

"script": "if(ctx._source.user_update_time > my_new_time) ctx._source.user_update_time=my_new_time;"

or

"script": "ctx._source.user_update_time > my_new_time ? ctx.op=\"none\" : ctx._source.user_update_time=my_new_time"
like image 20
Terran Avatar answered Oct 15 '22 02:10

Terran