Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is an empty MongoDB database so big?

Tags:

mongodb

When I create a new mongoDB database instance with the command

mongod --dbpath db

where db is a folder I have made in the directory from which I call the command. After running this and checking the size of the directory, I see that is is over 300MB in size - when there's no data in there.

What's going on here?

Thanks for any help!


EDIT

Thanks to the people talking about the pre-allocated size for journal.

Here is a listing of files/folders in the database directory, sorted by order (there is a little bit of data in the database by now, but its size is neglible here):

$du -ha | sort -n
4.0K    ./WiredTiger
4.0K    ./WiredTiger.lock
4.0K    ./WiredTiger.turtle
4.0K    ./WiredTigerLAS.wt
4.0K    ./mongod.lock
4.0K    ./storage.bson
8.0K    ./.DS_Store
8.0K    ./diagnostic.data/metrics.2016-06-10T11-07-50Z-00000
8.0K    ./diagnostic.data/metrics.interim
 16K    ./_mdb_catalog.wt
 16K    ./index-3-3697658674625742251.wt
 36K    ./collection-0-3697658674625742251.wt
 36K    ./index-1-3697658674625742251.wt
 36K    ./sizeStorer.wt
 44K    ./WiredTiger.wt
 60K    ./collection-2-3697658674625742251.wt
 72K    ./diagnostic.data/metrics.2016-06-10T10-19-31Z-00000
100M    ./journal/WiredTigerLog.0000000003
100M    ./journal/WiredTigerPreplog.0000000001
100M    ./journal/WiredTigerPreplog.0000000002
168K    ./diagnostic.data/metrics.2016-06-10T11-17-58Z-00000
256K    ./diagnostic.data
300M    ./journal
301M    .

As you can see, the journal directory is taking up almost all of the space.

like image 809
dafyddPrys Avatar asked Jun 10 '16 08:06

dafyddPrys


2 Answers

Depending on your version of MongoDB and configured storage engine, several data and metadata files will be preallocated on startup. This is the expected behaviour: an "empty" deployment still results in housekeeping and diagnostic data.

Based on your directory listing, you are running MongoDB 3.2 which defaults to using the WiredTiger storage engine. WiredTiger allocates up to 100MB per journal file, so your new deployment has ~300MB of preallocated journal files:

     100M    ./journal/WiredTigerLog.0000000003
     100M    ./journal/WiredTigerPreplog.0000000001
     100M    ./journal/WiredTigerPreplog.0000000002

Aside from journal files, other metadata that will be created in your dbpath (without you having explicitly created databases yet) will include:

  • A local database with a capped collection called startup_log with some diagnostic information about each startup invocation of this instance of mongod. There will be an associated collection and index file for local.startup_log; the filenames are opaque but as the first files created I'm guessing in your example these might be:

     36K    ./collection-0-3697658674625742251.wt
     36K    ./index-1-3697658674625742251.wt
    
  • Multiple WiredTiger metadata files. There will always be at least one database in a deployment since the local database is created by default for the startup_log:

    4.0K    ./WiredTiger
    4.0K    ./WiredTiger.lock
    4.0K    ./WiredTiger.turtle
    4.0K    ./WiredTigerLAS.wt
     16K    ./_mdb_catalog.wt
     36K    ./sizeStorer.wt
     44K    ./WiredTiger.wt
    
  • A diagnostic.data directory; this is for periodic sampling of server status metrics:

    168K    ./diagnostic.data/metrics.2016-06-10T11-17-58Z-00000
    72K    ./diagnostic.data/metrics.2016-06-10T10-19-31Z-00000
    
like image 91
Stennie Avatar answered Oct 18 '22 02:10

Stennie


When you create a new database, MongoDB create space for the oplog.

The oplog (operations log) is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases.

like image 36
Rajiv Sharma Avatar answered Oct 18 '22 01:10

Rajiv Sharma