Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pymongo replace_one modified_count always 1 even if not changing anything

Why and how can this work like this?

item = db.test.find_one()
result = db.test.replace_one(item, item)
print(result.raw_result)
# Gives: {u'n': 1, u'nModified': 1, u'ok': 1, 'updatedExisting': True}
print(result.modified_count)
# Gives 1

when the equivalent in mongodb shell is always 0

item = db.test.findOne()
db.test.replaceOne(item, item)
# Gives: {"acknowledged" : true, "matchedCount" : 1.0, "modifiedCount" : 0.0}

How can I get consistent results and properly detect when the replacement is actually changing the data?

like image 716
Adrián Avatar asked Sep 20 '16 16:09

Adrián


1 Answers

This is because MongoDB stores documents in binary (BSON) format. Key-value pairs in a BSON document can have any order (except that _id is always first). Let's start with the mongo shell first. The mongo shell preserves the key order when reading and writing data. For example:

> db.collection.insert({_id:1, a:2, b:3})
{ "_id" : 1, "a" : 2, "b" : 3 }

If you are performing replaceOne() using this document value, it would avoid a modification because there's an existing BSON.

> var doc = db.collection.findOne()
> db.collection.replaceOne(doc, doc)
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 0 }

However, if you change the ordering of the fields it would detect a modification

> var doc_2 = {_id:1, b:3, a:2}
> db.collection.replaceOne(doc_2, doc_2)
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }

Let's step into the Python world. PyMongo represents BSON documents as Python dictionary by default, the order of keys in Python dictionary is not defined. Therefore, you cannot predict how it will be serialised to BSON. As per your example:

> doc = db.collection.find_one()
{u'_id': 1.0, u'a': 2.0, u'b': 3.0}

> result = db.collection.replace_one(doc, doc)
> result.raw_result
{u'n': 1, u'nModified': 1, u'ok': 1, 'updatedExisting': True}

If it matters for your use case, one workaround is to use bson.SON. For example:

> from bson import CodecOptions, SON
> opts=CodecOptions(document_class=SON)
> collection_son = db.collection.with_options(codec_options=opts)
> doc_2 = collection_son.find_one()
SON([(u'_id', 1.0), (u'a', 2.0), (u'b', 3.0)])

> result = collection_son.replace_one(doc_2, doc_2)
{u'n': 1, u'nModified': 0, u'ok': 1, 'updatedExisting': True}

You can also observe that bson.SON is used in PyMongo (v3.3.0) i.e. _update() method. See also related article: PyMongo and Key Order in SubDocuments.

Update to answer an additional question:

As far as I know, there is no a 'standard' function to convert a nested dictionary to SON. Although you can write a custom dict to SON converter yourself, for example:

def to_son(value):
     for k, v in value.iteritems():
         if isinstance(v, dict):
             value[k] = to_son(v)
         elif isinstance(v, list):
             value[k] = [to_son(x) for x in v]
     return bson.son.SON(value)
# Assuming the order of the dictionary is as you desired. 
to_son(a_nested_dict)

Or utilise bson as an intermediate format

from bson import CodecOptions, SON, BSON
nested_bson = BSON.encode(a_nested_dict)
nested_son = BSON.decode(nested_bson, codec_options=CodecOptions(document_class=SON))

Once in SON format, you can convert back to Python dictionary using SON.to_dict()

like image 103
Wan Bachtiar Avatar answered Oct 21 '22 12:10

Wan Bachtiar