Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB workaround for document above 16mb size?

Tags:

mongodb

The collection of MongoDB I am working on takes sensor data from cellphone and it is pinged to the server like every 2-6 seconds.

The data is huge and the limit of 16mb is crossed after 4-5 hours, there don't seem to be any work around for this?

I have tried searching for it on Stack Overflow and went through various questions but no one actually shared their hack.

Is there any way... on the DB side maybe which will distribute the chunk like it is done for big files via gridFS?

like image 399
DeathNote Avatar asked Oct 21 '16 11:10

DeathNote


People also ask

How big is a 16MB MongoDB document?

The maximum BSON document size is 16MB or 16777216 bytes.

How can I store large files in MongoDB?

In MongoDB, use GridFS for storing files larger than 16 MB. In some situations, storing large files may be more efficient in a MongoDB database than on a system-level filesystem. If your filesystem limits the number of files in a directory, you can use GridFS to store as many files as needed.

What is the maximum size of a document MongoDB allows?

The maximum size an individual document can be in MongoDB is 16MB with a nested depth of 100 levels. Edit: There is no max size for an individual MongoDB database.

Can we increase document size in MongoDB?

As you know, MongoDB stores data in a document. The limit for one document is 16Mb. You can also use GridFS to store large files that can exceed 16Mb. It will store them in multiple chunks.


1 Answers

To fix this problem you will need to make some small amendments to your data structure. By the sounds of it, for your documents to exceed the 16mb limit, you must be embedding your sensor data into an array in a single document.

I would not suggest using GridFS here, I do not believe it to be the best solution, and here is why.

There is a technique known as bucketing that you could employ which will essentially split your sensor readings out into separate documents, solving this problem for you.

The way it works is this:

Lets say I have a document with some embedded readings for a particular sensor that looks like this:

{
    _id : ObjectId("xxx"),
    sensor : "SensorName1",
    readings : [
        { date : ISODate("..."), reading : "xxx" },
        { date : ISODate("..."), reading : "xxx" },
        { date : ISODate("..."), reading : "xxx" }
    ]
}

With the structure above, there is already a major flaw, the readings array could grow exponentially, and exceed the 16mb document limit.

So what we can do is change the structure slightly to look like this, to include a count property:

{
    _id : ObjectId("xxx"),
    sensor : "SensorName1",
    readings : [
        { date : ISODate("..."), reading : "xxx" },
        { date : ISODate("..."), reading : "xxx" },
        { date : ISODate("..."), reading : "xxx" }
    ],
    count : 3
}

The idea behind this is, when you $push your reading into your embedded array, you increment ($inc) the count variable for every push that is performed. And when you perform this update (push) operation, you would include a filter on this "count" property, which might look something like this:

{ count : { $lt : 500} }

Then, set your Update Options so that you can set "upsert" to "true":

db.sensorReadings.update(
    { name: "SensorName1", count { $lt : 500} },
    {
        //Your update. $push your reading and $inc your count
        $push: { readings: [ReadingDocumentToPush] }, 
        $inc: { count: 1 }
    },
    { upsert: true }
)

see here for more info on MongoDb Update and the Upsert option:

MongoDB update documentation

What will happen is, when the filter condition is not met (i.e when there is either no existing document for this sensor, or the count is greater or equal to 500 - because you are incrementing it every time an item is pushed), a new document will be created, and the readings will now be embedded in this new document. So you will never hit the 16mb limit if you do this properly.

Now, when querying the database for readings of a particular sensor, you may get back multiple documents for that sensor (instead of just one with all the readings in it), for example, if you have 10,000 readings, you will get 20 documents back, each with 500 readings each.

You can then use aggregation pipeline and $unwind to filter your readings as if they were their own individual documents.

For more information on unwind see here, it's very useful

MongoDB Unwind

I hope this helps.

like image 183
pieperu Avatar answered Oct 06 '22 01:10

pieperu