Auto compact the deleted space in mongodb?

3 Answers

In general if you don't need to shrink your datafiles you shouldn't shrink them at all. This is because "growing" your datafiles on disk is a fairly expensive operation and the more space that MongoDB can allocate in datafiles the less fragmentation you will have.

So, you should try to provide as much disk-space as possible for the database.

However if you must shrink the database you should keep two things in mind.

MongoDB grows it's data files by doubling so the datafiles may be 64MB, then 128MB, etc up to 2GB (at which point it stops doubling to keep files until 2GB.)
As with most any database ... to do operations like shrinking you'll need to schedule a separate job to do so, there is no "autoshrink" in MongoDB. In fact of the major noSQL databases (hate that name) only Riak will autoshrink. So, you'll need to create a job using your OS's scheduler to run a shrink. You could use an bash script, or have a job run a php script, etc.

Serverside Javascript

You can use server side Javascript to do the shrink and run that JS via mongo's shell on a regular bases via a job (like cron or the windows scheduling service) ...

Assuming a collection called foo you would save the javascript below into a file called bar.js and run ...

$ mongo foo bar.js

The javascript file would look something like ...

// Get a the current collection size.
var storage = db.foo.storageSize();
var total = db.foo.totalSize();

print('Storage Size: ' + tojson(storage));

print('TotalSize: ' + tojson(total));

print('-----------------------');
print('Running db.repairDatabase()');
print('-----------------------');

// Run repair
db.repairDatabase()

// Get new collection sizes.
var storage_a = db.foo.storageSize();
var total_a = db.foo.totalSize();

print('Storage Size: ' + tojson(storage_a));
print('TotalSize: ' + tojson(total_a));

This will run and return something like ...

MongoDB shell version: 1.6.4
connecting to: foo
Storage Size: 51351
TotalSize: 79152
-----------------------
Running db.repairDatabase()
-----------------------
Storage Size: 40960
TotalSize: 65153

Run this on a schedule (during none peak hours) and you are good to go.

Capped Collections

However there is one other option, capped collections.

Capped collections are fixed sized collections that have a very high performance auto-FIFO age-out feature (age out is based on insertion order). They are a bit like the "RRD" concept if you are familiar with that.

In addition, capped collections automatically, with high performance, maintain insertion order for the objects in the collection; this is very powerful for certain use cases such as logging.

Basically you can limit the size of (or number of documents in ) a collection to say .. 20GB and once that limit is reached MongoDB will start to throw out the oldest records and replace them with newer entries as they come in.

This is a great way to keep a large amount of data, discarding the older data as time goes by and keeping the same amount of disk-space used.

answered Oct 16 '22 15:10

Justin Jenkins

I have another solution that might work better than doing db.repairDatabase() if you can't afford for the system to be locked, or don't have double the storage.

You must be using a replica set.

My thought is once you've removed all of the excess data that's gobbling your disk, stop a secondary replica, wipe its data directory, start it up and let it resynchronize with the master.

The process is time consuming, but it should only cost a few seconds of down time, when you do the rs.stepDown().

Also this can not be automated. Well it could, but I don't think I'm willing to try.

answered Oct 16 '22 15:10

Mojo

Running db.repairDatabase() will require that you have space equal to the current size of the database available on the file system. This can be bothersome when you know that the collections left or data you need to retain in the database would currently use much less space than what is allocated and you do not have enough space to make the repair.

As an alternative if you have few collections you actually need to retain or only want a subset of the data, then you can move the data you need to keep into a new database and drop the old one. If you need the same database name you can then move them back into a fresh db by the same name. Just make sure you recreate any indexes.

use cleanup_database
db.dropDatabase();

use oversize_database

db.collection.find({},{}).forEach(function(doc){
    db = db.getSiblingDB("cleanup_database");
    db.collection_subset.insert(doc);
});

use oversize_database
db.dropDatabase();

use cleanup_database

db.collection_subset.find({},{}).forEach(function(doc){
    db = db.getSiblingDB("oversize_database");
    db.collection.insert(doc);
});

use oversize_database

<add indexes>
db.collection.ensureIndex({field:1});

use cleanup_database
db.dropDatabase();

An export/drop/import operation for databases with many collections would likely achieve the same result but I have not tested.

Also as a policy you can keep permanent collections in a separate database from your transient/processing data and simply drop the processing database once your jobs complete. Since MongoDB is schema-less, nothing except indexes would be lost and your db and collections will be recreated when the inserts for the processes run next. Just make sure your jobs include creating any nessecary indexes at an appropriate time.

answered Oct 16 '22 13:10

Robert Jobson

Related questions
                            
                                How to load 100 million records into MongoDB with Scala for performance testing?
                            
                                Spring data MongoDb: MappingMongoConverter remove _class
                            
                                Creating custom Object ID in MongoDB
                            
                                MongoDB only works when run as root on Ubuntu - data directory issue
                            
                                MongoDB aggregation comparison: group(), $group and MapReduce
                            
                                strange mongodb and mongoose error: not master and slaveOk=false error
                            
                                What's the difference between Spring Data MongoDB and Hibernate OGM for MongoDB?
                            
                                Get n-th element of an array in MongoDB
                            
                                Working with special characters in a Mongo collection
                            
                                What's the meaning of 'trim' when use in mongoose?
                            
                                Mongoose - remove multiple documents in one function call
                            
                                Getting "'mongoimport' is not recognized as an internal or external command, operable program, or batch file." when trying to import data from a file
                            
                                Mongodb - proper way to delete all elements in an array field?
                            
                                Why can't I debug code in an async method?
                            
                                Difference between embeds_many and has_many in mongoid
                            
                                Java/MongoDB query by date
                            
                                Mongoose: how to use aggregate and find together
                            
                                Proper way to import json file to mongo
                            
                                What does "too many positional options" mean when doing a mongoexport?
                            
                                Batch insert/update using Mongoid?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Auto compact the deleted space in mongodb?

Tags:

mongodb

diskspace

repair

Zealot Ke

People also ask

3 Answers

Justin Jenkins

Mojo

Robert Jobson

Recent Activity

Donate For Us