Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb data files become smaller after migration

On my first server I get:

root@prod ~ # du -hs /var/lib/mongodb/
909G    /var/lib/mongodb/

After migration this database with mongodump/mongorestore On my second server I get:

root@prod ~ # du -hs /var/lib/mongodb/
30G /var/lib/mongodb/

After I waited a few hours, mongo finished indexing I got:

root@prod ~ # du -hs /var/lib/mongodb/
54G /var/lib/mongodb/

I tested database and there's no corrupted or missed data.

Why there's so big difference in size before and after migration?

like image 666
n0nSmoker Avatar asked Jul 29 '14 16:07

n0nSmoker


2 Answers

MongoDB does not recover disk space when actually data size drops due to data deletion along with other causes. There's a decent explanation in the online docs:

Why are the files in my data directory larger than the data in my database?

The data files in your data directory, which is the /data/db directory in default configurations, might be larger than the data set inserted into the database. Consider the following possible causes:

Preallocated data files.

In the data directory, MongoDB preallocates data files to a particular size, in part to prevent file system fragmentation. MongoDB names the first data file .0, the next .1, etc. The first file mongod allocates is 64 megabytes, the next 128 megabytes, and so on, up to 2 gigabytes, at which point all subsequent files are 2 gigabytes. The data files include files with allocated space but that hold no data. mongod may allocate a 1 gigabyte data file that may be 90% empty. For most larger databases, unused allocated space is small compared to the database.

On Unix-like systems, mongod preallocates an additional data file and initializes the disk space to 0. Preallocating data files in the background prevents significant delays when a new database file is next allocated.

You can disable preallocation by setting preallocDataFiles to false. However do not disable preallocDataFiles for production environments: only use preallocDataFiles for testing and with small data sets where you frequently drop databases.

On Linux systems you can use hdparm to get an idea of how costly allocation might be:

time hdparm --fallocate $((1024*1024)) testfile

The oplog.

If this mongod is a member of a replica set, the data directory includes the oplog.rs file, which is a preallocated capped collection in the local database. The default allocation is approximately 5% of disk space on 64-bit installations, see Oplog Sizing for more information. In most cases, you should not need to resize the oplog. However, if you do, see Change the Size of the Oplog.

The journal.

The data directory contains the journal files, which store write operations on disk prior to MongoDB applying them to databases. See Journaling Mechanics.

Empty records.

MongoDB maintains lists of empty records in data files when deleting documents and collections. MongoDB can reuse this space, but will never return this space to the operating system.

To de-fragment allocated storage, use compact, which de-fragments allocated space. By de-fragmenting storage, MongoDB can effectively use the allocated space. compact requires up to 2 gigabytes of extra disk space to run. Do not use compact if you are critically low on disk space.

Important

compact only removes fragmentation from MongoDB data files and does not return any disk space to the operating system.

To reclaim deleted space, use repairDatabase, which rebuilds the database which de-fragments the storage and may release space to the operating system. repairDatabase requires up to 2 gigabytes of extra disk space to run. Do not use repairDatabase if you are critically low on disk space.

http://docs.mongodb.org/manual/faq/storage/

What they don't tell you are the two other ways to restore/recover disk space - mongodump/mongorestore as you did or adding a new member to the replica set with an empty disk so that it writes it's databsae files from scratch.

If you are interested in monitoring this, the db.stats() command returns a wealth of data on data, index, storage and file sizes:

http://docs.mongodb.org/manual/reference/command/dbStats/

like image 151
John Petrone Avatar answered Oct 01 '22 18:10

John Petrone


Over time the MongoDB files develop fragmentation. When you do a "migration", or whack the data directory and force a re-sync, the files pack down. If your application does a lot of deletes or updates which grow the documents fragmentation develops fairly quickly. In our deployment it is updates that grow the documents that causes this. Somehow MongoDB moves the document when it sees that the updated document can't fit in the space of the original document. There is some way to add padding factors to the collection to avoid this.

like image 41
Bob Kuhar Avatar answered Oct 01 '22 20:10

Bob Kuhar