Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix ElasticSearch conflicts on the same key when two process writing at the same time

I have multiple processes to write data to ES at the same time, also two processes may write the same key with different values at the same time, it caused the exception as following:

"error" : "VersionConflictEngineException[[website][2] [blog][1]:              version conflict, current [2], provided [1]]", "status" : 409 

How could I fix the above problem please, since I have to keep multiple processes.

like image 408
Jack Avatar asked Mar 23 '16 21:03

Jack


2 Answers

VersionConflictEngineException is thrown to prevent data loss. Every document in elasticsearch has a _version number that is incremented whenever a document is changed.

When you query a doc from ES, the response also includes the version of that doc. When you update the same doc and provide a version, then a document with the same version is expected to be already existing in the index.

If the current version is greater than the one in the update request, What we would get now is a conflict, with the HTTP error code of 409 and VersionConflictEngineException

In your current scenario,

version conflict, current 2, provided 1

The current version in ES is 2 whereas in your request is 1 which means some other thread has already modified the doc and your change is trying overwrite the doc.

In case of VersionConflictEngineException, you should re-fetch the doc and try to update again with the latest updated version.

Whether or not to use the versioning / Optimistic Concurrency Control, depends on the application. If you can live with data-loss, you may avoid passing version in the update request.

like image 132
Rahul Avatar answered Sep 18 '22 14:09

Rahul


The ES provides the ability to use the retry_on_conflict query parameter.

Specify how many times should the operation be retried when a conflict occurs. Default: 0.

If you have several parallel scripts that can simultaneously work with the same document, you can use this parameter.

For example: You have an index for tweets. And 5 processes that will work with this index. It is possible that all 5 scripts will work with the same document (some tweet). In this case, you can use the ...&retry_on_conflict=6 parameter. Why 6? 5 processes + 1 (plus some legroom). Thus, the ES will try to re-update the document up to 6 times if conflicts occur.

like image 44
Daniel Abyan Avatar answered Sep 17 '22 14:09

Daniel Abyan