Please look at the following lines of code and the results:
import pymongo
d1 = {'p': 0.5, 'theta': 100, 'sigma': 20}
d2 = {'theta': 100, 'sigma': 20, 'p': 0.5}
I get the following results:
d1 == d2 // Returns True
collectn.find({'goods.H': d1}).count() // Returns 33
collectn.find({'goods.H': d2}).count() // Returns 2
where, collectn
is a Mongodb collections object.
Is there a setting or a way to query so that I obtain the same results for the above two queries?
They are essentially using the same dictionary (in
the sense of d1 == d2
being True
). I am trying to do the following:
before inserting a record into the database I check whether there
already exists a record with the exact value combination that is being added.
If so, then I don't want to make a new record. But because of the above
shown behavior it becomes possible to get that the record does not exist even
when it does and a duplicate record is added to the database (of course, with different _id
but all other values are the same, and I would prefer not to have that).
Thank you in advance for your help.
The issue you are having is explained in the mongodb documentation here. It also has to do with the fact that Python dictionaries are unordered and MongoDB objects are ordered BSON objects.
The relevant quote being,
Equality matches within subdocuments select documents if the subdocument matches exactly the specified subdocument, including the field order.
I think you might be better off if you treat all three properties as subproperties of the main object instead of one collection of properties that is the subobject. That way the ordering of the subobject is not forced into the query by the python interpreter.
For instance...
d1 = {'goods.H.p': 0.5, 'goods.H.theta': 100, 'goods.H.sigma': 20}
d2 = {'goods.H.theta': 100, 'goods.H.sigma': 20, 'goods.H.p': 0.5}
collectn.find(d1).count()
collectn.find(d2).count()
...may yield more consistent results.
Finally, a way to do it changing less code:
collectn.find({'goods.H.' + k:v for k,v in d1.items()})
collectn.find({'goods.H.' + k:v for k,v in d2.items()})
I can only think of two things to do:
Structure your query as this: collectn.find({'goods.H.p':0.5, 'goods.H.theta':100, 'goods.H.sigma':
20}).count(). That will find the correct number of documents...
Restructure your data -> if you look at MongoDB : Indexes order and query order must match? you will that you can index on p,sigma,theta so that when, in the query, any order of the terms will provide the correct result. In my brief tests (I am no expert) I was not able to index in a way that produces that same effect with your current structure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With