I've noticed, that every time I compact my CouchDB instance after inserting some stuff, the size drops down quite a lot (sometimes even down to 20%).
I'm not deleting or modifying any data, all I do is basically insert new records, compact, and the size goes down.
What is actually happening when I'm compacting the database? Is it somehow compressing the data? Or is it because every new record comes with some junk, which is later removed by the compact?
CouchDB uses an append-only file format. The code never, ever, performs an fseek(3). Any truncated piece of the .couch file which starts from the beginning is a valid database file. (CouchDB scans backwards from the end to find its "header").
The cost of this architecture is writing a lot of duplicate data every time you make a change. Basically, couch writes your new data to the end of the file, then writes all metadata updates needed to incorporate that data into the data tree, and it writes a new header to commit all of that permanently.
So you get lots of duplicate metadata (inner b-tree nodes, etc.) not to mention old document data, building up in the .couch file. Once again, this is to pay for the bulletproof technique of never ever overwriting any data.
Compacting scans only the relevant data from an old .couch file and writing only that into a new .couch file. The b-trees are balanced, the old documents aren't there anymore. It's nice and clean.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With