Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it 100 million documents too much?

Well, I am new to mongo and today morning I had a (bad) idea. I was playing around with indexes from the shell and decided to create a large collection with many documents (100 million). So I executed the following command:

for (i = 1; i <= 100; i++) { 
    for (j = 100; j > 0; j--) { 
        for (k = 1; k <= 100; k++) { 
            for (l = 100; l > 0; l--) {
                db.testIndexes.insert({a:i, b:j, c:k, d:l})
            }
        }
    }
}

However, the things didn't go as I expected:

  1. It took 45 minutes complete the request.
  2. It created 16 GB data on my hard disk.
  3. It used 80% of my RAM (8GB total) and it won't release them till I restarted my PC.

As you can see in the photo below, as the number of documents inside the collection was growing, the time of the insertion of documents was growing as well. I suggest that by the last modification time of the data files:

enter image description here

Is this an expected behavior? I don't think that 100 million simple documents are too much.

P.S. I am now really afraid to run an ensureIndex command.

Edit:

I executed the following command:

> db.testIndexes.stats()
{
        "ns" : "test.testIndexes",
        "count" : 100000000,
        "size" : 7200000056,
        "avgObjSize" : 72.00000056,
        "storageSize" : 10830266336,
        "numExtents" : 28,
        "nindexes" : 1,
        "lastExtentSize" : 2146426864,
        "paddingFactor" : 1,
        "systemFlags" : 1,
        "userFlags" : 0,
        "totalIndexSize" : 3248014112,
        "indexSizes" : {
                "_id_" : 3248014112
        },
        "ok" : 1
}

So, the default index on _id has more than 3GB size.

like image 304
chaliasos Avatar asked Jun 08 '13 13:06

chaliasos


People also ask

Can MongoDB handle billions of records?

At some point it may be possible to actually drop old data but MongoDB must be able to handle billions of documents and be able serve query in reasonable time. At this time MongoDB needs to run locally. Scaling, vertically or horizontically could be an option.

How many documents can MongoDB have?

MongoDB is a NoSQL database which can store JSON like documents in containers known as collections. There is no technical limit as to how many documents can be stored in a MongoDB collection. However, current versions of MongoDB has a limit on the maximum size of a single document stored in a collection. Since 1.7.

How many entries can MongoDB handle?

Mongo can easily handle billions of documents and can have billions of documents in the one collection but remember that the maximum document size is 16mb. There are many folk with billions of documents in MongoDB and there's lots of discussions about it on the MongoDB Google User Group.

How fast is MongoDB search?

2. How fast are MongoDB queries? Pretty darn fast. Primary key or index queries should take just a few milliseconds.


1 Answers

It took 45 minutes complete the request.

Not surprised.

It created 16 GB data on my hard disk.

As @Abhishek states everything seems fine, MongoDB does use a fair amount of space without compression currently (that's coming later hopefully).

It seems that the data size is about 7.2GB while the average object size is 72 bytes, it seems this is working perfectly (since 72 bytes fits into 7.2GB) with the 3GB overhead of the _id index it seems that the storage size of 10GB is fitting quite well.

Though I am concerned that it has used 6GB more than the statistics say it needs to, that might need more looking into. I am guessing it is because of how MongoDB wrote to the data files, it might even be because you was not using a non fire and forget write concern (w>0), all in all; hmmm.

It used 80% of my RAM (8GB total) and it won't release them till I restarted my PC.

MongoDB will try and take as much RAM as the OS will let it. If the OS lets it take 80% then 80% it will take. This is actually a good sign, it shows that MongoDB has the right configuration values to store your working set efficiently.

When running ensureIndex mongod will never free up RAM. It simply has no hooks for that, instead the OS will shrink its allocated block to make room for more (or should rather).

like image 157
Sammaye Avatar answered Nov 16 '22 02:11

Sammaye