I'm in the process of porting my application from an App Engine Datastore to a MongoDB backend and have a question regarding the consistency of "document updates." I understand that the updates on one document are all atomic and isolated, but is there a way to guarantee that they're "consistent" across different replica sets?
In our application, many users can (and will) be trying to update one document at the same time by inserting a few embedded documents (objects) into it during one single update. We need to ensure these updates occur in a logically consistent manner across all replicas, i.e. when one user "puts" a few embedded documents into the parent document, no other users can put their embedded documents in the parent document until we ensure they've read and received the first user's updates.
So what I mean by consistency is that we need a way to ensure that if two users attempt to perform an update on one document at exactly the same time, MongoDB only allows one of those updates to go through, and discards the other one (or at least prevents both from occuring). We can't use a standard "sharding" solution here, because a single update consists of more than just an increment or decrement.
What's the best way of guaranteeing the consistency of one particular document?
In MongoDB, a write operation is atomic on the level of a single document, even if the operation modifies multiple embedded documents within a single document.
By default, MongoDB is a strongly consistent system. Once a write completes, any subsequent read will return the most recent value. Cassandra, by default, is an eventually consistent system. Once a write completes, the latest data eventually becomes available provided no subsequent changes are made.
MongoDB includes a number of features that allow database administrators and developers to isolate workload by functional or geographical groupings.
There may be other ways to accomplish this, but one approach is to version your documents, and issue updates against only the version that the user had previously read (i.e., ensure that no one else has updated the document since it was last read). Here's a brief example of this technique using pymongo:
>>> db.foo.save({'_id': 'a', 'version': 1, 'things': []}, safe=True)
'a'
>>> db.foo.update({'_id': 'a', 'version': 1}, {'$push': {'things': 'thing1'}, '$inc': {'version': 1}}, safe=True)
{'updatedExisting': True, 'connectionId': 112, 'ok': 1.0, 'err': None, 'n': 1}
note in the above, key "n" is 1, indicating that the document was updated
>>> db.foo.update({'_id': 'a', 'version': 1}, {'$push': {'things': 'thing2'}, '$inc': {'version': 1}}, safe=True)
{'updatedExisting': False, 'connectionId': 112, 'ok': 1.0, 'err': None, 'n': 0}
here where we tried to update against the wrong version, key "n" is 0
>>> db.foo.update({'_id': 'a', 'version': 2}, {'$push': {'things': 'thing2'}, '$inc': {'version': 1}}, safe=True)
{'updatedExisting': True, 'connectionId': 112, 'ok': 1.0, 'err': None, 'n': 1}
>>> db.foo.find_one()
{'things': ['thing1', 'thing2'], '_id': 'a', 'version': 3}
Note that this technique relies on using safe writes, otherwise we don't get an acknowledgement indicating the number of documents updated. A variation on this would use the findAndModify
command, which will either return the document, or None
(in Python) if no document matching the query was found. findAndModify
allows you to return either the new (i.e. after updates are applied) or old version of the document.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With