I have a data source which will create a high number of entries that I'm planning to store in ElasticSearch. The source creates two entries for the same document in ElasticSearch:
I will need to use time-based indexes in ElasticSearch, with an alias pointing to the actual index, using the rollover index. For updates I'll use the update API to merge init and finish.
Question: If the init document with the random key is not in the current index (but in an older one already rolled over) would updating it using it's key successfully execute? If not, what is the best practice to perform the update?
The script can update, delete, or skip modifying the document. The update API also supports passing a partial document, which is merged into the existing document. To fully replace an existing document, use the index API.
In Elasticsearch, to replace a document you simply have to index a document with the same ID and it will be replaced automatically. If you would like to update a document you can either do a scripted update, a partial update or both.
For little changes in Index or index settings you can use update API where you can update index settings ( No of replicas, refresh interval etc.) . Also, you can update documents and add field using update API in Elasticsearch.
Elasticsearch allows us to do partial updates, but internally these are “get_then_update” operations, where the whole document is fetched, the changes are applied and then the document is indexed again. Even without disk hits one can imagine the potential performance implications if this is your main use case.
After some quietness I've set out to test it.
Short answer: After the index is rolled over under an alias, an update operation using the alias refers to the new index only, so it will create the document in the new index, resulting in two separate documents.
One way of solving it is to perform a search in the last 2 (or more if needed) indexes and figure out which non-alias index name to use for the update.
Other solution which I prefer is to avoid using the rollover, but calculate index name from the required date field of our document, and create new index from the application, using template to define mapping. This way event sourcing and replaying the documents in order will yield the same indexes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With