Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pymongo or Mongodb is treating two equal python dictionaries as different objects. Can I force them to be treated the same?

Please look at the following lines of code and the results:

import pymongo

d1 = {'p': 0.5, 'theta': 100, 'sigma': 20}
d2 = {'theta': 100, 'sigma': 20, 'p': 0.5}

I get the following results:

d1 == d2 // Returns True

collectn.find({'goods.H': d1}).count() // Returns 33

collectn.find({'goods.H': d2}).count() // Returns 2

where, collectn is a Mongodb collections object.

Is there a setting or a way to query so that I obtain the same results for the above two queries?

They are essentially using the same dictionary (in the sense of d1 == d2 being True). I am trying to do the following: before inserting a record into the database I check whether there already exists a record with the exact value combination that is being added. If so, then I don't want to make a new record. But because of the above shown behavior it becomes possible to get that the record does not exist even when it does and a duplicate record is added to the database (of course, with different _id but all other values are the same, and I would prefer not to have that).

Thank you in advance for your help.

like image 613
Curious2learn Avatar asked Jan 14 '13 18:01

Curious2learn


2 Answers

The issue you are having is explained in the mongodb documentation here. It also has to do with the fact that Python dictionaries are unordered and MongoDB objects are ordered BSON objects.

The relevant quote being,

Equality matches within subdocuments select documents if the subdocument matches exactly the specified subdocument, including the field order.

I think you might be better off if you treat all three properties as subproperties of the main object instead of one collection of properties that is the subobject. That way the ordering of the subobject is not forced into the query by the python interpreter.

For instance...

d1 = {'goods.H.p': 0.5, 'goods.H.theta': 100, 'goods.H.sigma': 20}
d2 = {'goods.H.theta': 100, 'goods.H.sigma': 20, 'goods.H.p': 0.5}

collectn.find(d1).count()
collectn.find(d2).count()

...may yield more consistent results.

Finally, a way to do it changing less code:

collectn.find({'goods.H.' + k:v for k,v in d1.items()})
collectn.find({'goods.H.' + k:v for k,v in d2.items()})
like image 108
mayhewr Avatar answered Sep 20 '22 04:09

mayhewr


I can only think of two things to do:

  1. Structure your query as this: collectn.find({'goods.H.p':0.5, 'goods.H.theta':100, 'goods.H.sigma':20}).count(). That will find the correct number of documents...

  2. Restructure your data -> if you look at MongoDB : Indexes order and query order must match? you will that you can index on p,sigma,theta so that when, in the query, any order of the terms will provide the correct result. In my brief tests (I am no expert) I was not able to index in a way that produces that same effect with your current structure.

like image 20
IamAlexAlright Avatar answered Sep 22 '22 04:09

IamAlexAlright