Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Updating existing documents in ElasticSearch (ES) while using rollover API

I have a data source which will create a high number of entries that I'm planning to store in ElasticSearch. The source creates two entries for the same document in ElasticSearch:

  • the 'init' part which records init-time and other details under a random key in ES
  • the 'finish' part which contains the main data, and updates the initially created document (merges) in ES under the init's random key.

I will need to use time-based indexes in ElasticSearch, with an alias pointing to the actual index, using the rollover index. For updates I'll use the update API to merge init and finish.

Question: If the init document with the random key is not in the current index (but in an older one already rolled over) would updating it using it's key successfully execute? If not, what is the best practice to perform the update?

like image 220
Zoltan Avatar asked Jan 24 '17 14:01

Zoltan


People also ask

Can you update a document in Elasticsearch?

The script can update, delete, or skip modifying the document. The update API also supports passing a partial document, which is merged into the existing document. To fully replace an existing document, use the index API.

How do I replace a document in Elasticsearch?

In Elasticsearch, to replace a document you simply have to index a document with the same ID and it will be replaced automatically. If you would like to update a document you can either do a scripted update, a partial update or both.

Can you update an Elasticsearch index?

For little changes in Index or index settings you can use update API where you can update index settings ( No of replicas, refresh interval etc.) . Also, you can update documents and add field using update API in Elasticsearch.

Is Elasticsearch good for updates?

Elasticsearch allows us to do partial updates, but internally these are “get_then_update” operations, where the whole document is fetched, the changes are applied and then the document is indexed again. Even without disk hits one can imagine the potential performance implications if this is your main use case.


1 Answers

After some quietness I've set out to test it.

Short answer: After the index is rolled over under an alias, an update operation using the alias refers to the new index only, so it will create the document in the new index, resulting in two separate documents.

One way of solving it is to perform a search in the last 2 (or more if needed) indexes and figure out which non-alias index name to use for the update.

Other solution which I prefer is to avoid using the rollover, but calculate index name from the required date field of our document, and create new index from the application, using template to define mapping. This way event sourcing and replaying the documents in order will yield the same indexes.

like image 60
Zoltan Avatar answered Sep 26 '22 00:09

Zoltan