Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Updating a field in all records in elasticsearch

I'm new to ElasticSearch, so this is probably something quite trivial, but I haven't figured out anything better that fetching everything, processing with a script and updating the registers one by one.

I want to make something like a simple SQL update:

UPDATE RECORD SET SOMEFIELD = SOMEXPRESSION

My intent is to replace the actual bogus data with some data that makes more sense (so the expression is basically randomly choosing from a pool of valid values).

like image 819
fortran Avatar asked Apr 11 '13 10:04

fortran


People also ask

Can we update data in Elasticsearch?

The script can update, delete, or skip modifying the document. The update API also supports passing a partial document, which is merged into the existing document. To fully replace an existing document, use the index API.

What is Upsert Elasticsearch?

Upserts are "Update or Insert" operations. This means an upsert attempts to run your update script, but if the document does not exist (or the field you are trying to update doesn't exist), default values are inserted instead.


1 Answers

There are a couple of open issues about making possible to update documents by query.

The technical challenge is that lucene (the text search engine library that elasticsearch uses under the hood) segments are read only. You can never modify an existing document. What you need to do is delete the old version of the document (which by the way will only be marked as deleted till a segment merge happens) and index the new one. That's what the existing update api does. Therefore, an update by query might take a long time and lead to issues, that's why it's not released yet. A mechanism that allows to interrupt running queries would be a nice to have too for this case.

But there's the update by query plugin that exposes exactly that feature. Just beware of the potential risks before using it.

like image 139
javanna Avatar answered Oct 06 '22 07:10

javanna