Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does MongoDB takes up so much space?

I am trying to store records with a set of doubles and ints (around 15-20) in mongoDB. The records mostly (99.99%) have the same structure.

When I store the data in a root which is a very structured data storing format, the file is around 2.5GB for 22.5 Million records. For Mongo, however, the database size (from command show dbs) is around 21GB, whereas the data size (from db.collection.stats()) is around 13GB.

This is a huge overhead (Clarify: 13GB vs 2.5GB, I'm not even talking about the 21GB), and I guess it is because it stores both keys and values. So the question is, why and how Mongo doesn't do a better job in making it smaller?

But the main question is, what is the performance impact in this? I have 4 indexes and they come out to be 3GB, so running the server on a single 8GB machine can become a problem if I double the amount of data and try to keep a large working set in memory.

Any guesses into if I should be using SQL or some other DB? or maybe just keep working with ROOT files if anyone has tried them?

like image 683
xcorat Avatar asked Nov 20 '13 05:11

xcorat


1 Answers

Basically, this is mongo preparing for the insertion of data. Mongo performs prealocation of storage for data to prevent (or minimize) fragmentation on the disk. This prealocation is observed in the form of a file that the mongod instance creates.

First it creates a 64MB file, next 128MB, next 512MB, and on and on until it reaches files of 2GB (the maximum size of prealocated data files).

There are some more things that mongo does that might be suspect to using more disk space, things like journaling...

For much, much more info on how mongoDB uses storage space, you can take a look at this page and in specific the section titled Why are the files in my data directory larger than the data in my database?

There are some things that you can do to minimize the space that is used, but these tequniques (such as using the --smallfiles option) are usually only recommended for development and testing use - never for production.

like image 111
Lix Avatar answered Sep 23 '22 00:09

Lix