Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch run script on document insertion (Insert API)

Is it possible to specify a script be executed when inserting a document into ElasticSearch using its Index API? This functionality exists when updating an existing document with new information using its Update API, by passing in a script attribute in the HTTP request body. I think it would be useful too in the Index API because perhaps there are some fields the user wants to be auto-calculated and populated during insertion, without having to send an additional Update request after the insertion to have the script be executed.

like image 666
ecbrodie Avatar asked Nov 01 '22 04:11

ecbrodie


1 Answers

Elasticsearch 1.3

If you just need to search/filter on the fields that you'd like to add, the mapping transform capabilities that were added into 1.3.0 could possibly work for you:

The document can be transformed before it is indexed by registering a script in the transform element of the mapping. The result of the transform is indexed but the original source is stored in the _source field.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-transform.html

You can also have the same transformation run when you get a document as well by adding the _source_transform url parameter to the request:

The get endpoint will retransform the source if the _source_transform parameter is set.The transform is performed before any source filtering but it is mostly designed to make it easy to see what was passed to the index for debugging.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_get_transformed.html

However, I don't think the _search endpoint accepts the _source_transform url parameter so I don't think you can apply the transformation to search results. That would be a nice feature request.

Elasticsearch 1.4

Elasticsearch 1.4 added a couple features which makes all this much nicer. As you mentioned, the update API allows you to specify a script to be executed. The update API in 1.4 can also accept a default document to be used in the case of an upsert. From the 1.4 docs:

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
    "script" : "ctx._source.counter += count",
    "params" : {
        "count" : 4
    },
    "upsert" : {
        "counter" : 1
    }
}'

In the example above, if the document doesn't exist it uses the contents of the upsert key to initialize the document. So in the case above the counter key in the newly created document will have a value of 1.

Now, if we set scripted_upsert to true (scripted_upsert is another new option in 1.4), our script will run against the newly initialized document:

curl -XPOST 'localhost:9200/test/type1/2/_update' -d '{
    "script": "ctx._source.counter += count",
    "params": {
        "count": 4
    },
    "upsert": {
        "counter": 1
    },
    "scripted_upsert": true
}'

In this example, if the document didn't exist the counter key would have a value of 5.

Full documentation from Elasticsearch site.

like image 64
Ryan Grimm Avatar answered Nov 15 '22 07:11

Ryan Grimm