Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

storing data as object vs array in MongoDb for write performance

Should I store objects in an Array or inside an Object with top importance given Write Speed?


I'm trying to decide whether data should be stored as an array of objects, or using nested objects inside a mongodb document.

In this particular case, I'm keeping track of a set of continually updating files that I add and update and the file name acts as a key and the number of lines processed within the file.

the document looks something like this

{
  t_id:1220,
  some-other-info: {}, // there's other info here not updated frequently
  files: {
    log1-txt: {filename:"log1.txt",numlines:233,filesize:19928},
    log2-txt: {filename:"log2.txt",numlines:2,filesize:843}
  }
}

or this

{
  t_id:1220,
  some-other-info: {},
  files:[
    {filename:"log1.txt",numlines:233,filesize:19928},
    {filename:"log2.txt",numlines:2,filesize:843}
  ]
}

I am making an assumption that handling a document, especially when it comes to updates, it is easier to deal with objects, because the location of the object can be determined by the name; unlike an array, where I have to look through each object's value until I find the match.

Because the object key will have periods, I will need to convert (or drop) the periods to create a valid key (fi.le.log to filelog or fi-le-log). I'm not worried about the files' possible duplicate names emerging (such as fi.le.log and fi-le.log) so I would prefer to use Objects, because the number of files is relatively small, but the updates are frequent.

Or would it be better to handle this data in a separate collection for best write performance...

{
    "_id": ObjectId('56d9f1202d777d9806000003'),"t_id": "1220","filename": "log1.txt","filesize": 1843,"numlines": 554
},
{
    "_id": ObjectId('56d9f1392d777d9806000004'),"t_id": "1220","filename": "log2.txt","filesize": 5231,"numlines": 3027
}
like image 292
Daniel Avatar asked Mar 04 '16 20:03

Daniel


People also ask

Is MongoDB optimized for writes?

MongoDB supports writing to a specific number of replicas. This also ensures that the write is written to the journal on the secondaries.

How does MongoDB compare array of objects?

You can use $setDifference to compare min with max array and return if there are elements in the min array which are not in max array. Use $expr with $setDifference . $expr allows use of aggregation expressions in the regular find query. You can also look at here to return when there is matching elements in arrays.

Can we store array of objects in MongoDB?

One of the benefits of MongoDB's rich schema model is the ability to store arrays as document field values. Storing arrays as field values allows you to model one-to-many or many-to-many relationships in a single document, instead of across separate collections as you might in a relational database.

Does MongoDB preserve array order?

yep MongoDB keeps the order of the array.. just like Javascript engines..


1 Answers

From what I understand you are talking about write speed, without any read consideration. So we have to think about how you will insert/update your document.

We have to compare (assuming you know the _id you are replacing, replace {key} by the key name, in your example log1-txt or log2-txt):

db.Col.update({ _id: '' }, { $set: { 'files.{key}': object }})

vs

db.Col.update({ _id: '', 'files.filename': '{key}'}, { $set: { 'files.$': object }})

The second one means that MongoDB have to browse the array, find the matching index and update it. The first one means MongoDB just update the specified field.

The worst: The second command will not work if the matching filename is not present in the array! So you have to execute it, check if nMatched is 0, and create it if it is so. That's really bad write speed (see here MongoDB: upsert sub-document).

If you will never/almost never use read queries / aggregation framework on this collection: go for the first one, that will be faster. If you want to aggregate, unwind, do some analytics on the files you parsed to have statistics about file size and line numbers, you may consider using the second one, you will avoid some headache.

Pure write speed will be better with the first solution.

like image 113
Jonathan Muller Avatar answered Oct 04 '22 00:10

Jonathan Muller