I have a large set of JSON docs which I am willing to store in a MongoDB.
However, given I am searching and retrieving only against few fields, I was wondering from performance-wise which way it would be better.
One option is to store the large object as JSON/BSON so the doc will look like:
{
"key_1": "Value1",
"key_2": "Value2",
"external_data": {
"large": {
"data": [
"comes",
"here"
]
}
}
}
Or alternatively,
{
"key_1": "Value1",
"key_2": "Value2",
"external_data": '{"large":{"data":["comes","here"]}}'
}
Sort answer is no significant performance difference in writes
here is the code i used for test it using pymongo driver along the results:
docdict=dict(zip (["key" + str(i) for i in range (1,101)],[ "a"*i for i in range(1,101)])) docstr=str(docdict) def addIdtoStr(s,id):return {'_id':id,'payload':s} def addIdtoDict(d,id): d.update({'_id':id});return d cProfile.run("for i in range(0,100000):x=dbcl.client.tests.test2.insert(addIdtoDict(docdict,i),w=0,j=0)") **12301152 function calls (12301128 primitive calls) in 56.089 second** dbcl.client.tests.test2.remove({},multi= True) cProfile.run("for i in range(0,100000):x=dbcl.client.tests.test2.insert(addIdStr(docstr,i),w=0,j=0)") **12201194 function calls (12115631 primitive calls) in 54.665 seconds**
I believe that storing the data in BSON is both performance and space-efficient. And by that you "invest" in future: if you store the data as BSON now, then it'll be possible to db-query it later if such requirement appears.
But anyway, if your concern is performance - you do have to profile it in the production environment, there is no way to tell that "it'll be faster or not".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With