Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

couchdb keeps growing (filesize)

I'm very confused about the CouchDB behaviour in terms of database file-size on disk. It seems like it doesn't matter what I do, the database-file only gets bigger and bigger (even on deleting/purging documents or whole databases).

I watched my /var/lib/couchdb/_dbs.couch file and it never decreased in size ever. Simple example:

curl -X PUT http://admin:secretpassword@localhost:5984/testdb

_dbs.couch increased filesize by 5kb.

curl -X DELETE http://admin:secretpassword@localhost:5984/testdb

No changes in filesize. Even if I do filtered replications of Databases (filtering out deleted documents) or manually trigger a compaction, the disk file-size does not decrease. What's really confusing now is, that Fauxton actually shows reduced databases sizes after those actions, but it never reflects in the physical diskspace used.

I'm using pretty much a standard configuration after a fresh installation.

Is this "working like intended" or is there anything wrong here?

More importantly: Is there anything I can do about it?

like image 335
jAndy Avatar asked Jan 31 '18 22:01

jAndy


1 Answers

It's working as intended, you're just not looking at the right files.

Each database has corresponding files with the same name.

For example with:

curl -X PUT http://admin:secretpassword@localhost:5984/testdb

curl -X PUT http://admin:secretpassword@localhost:5984/emaildb

  • Since you have a _dbs.couch file, you're probably using CouchDB 2.X.X with the sharding feature. It will create multiple files in subfolders of the "shards" folder.

data/ +-- shards/ | +-- 00000000-7fffffff/ | | -- emaildb.124456678.couch | | -- testdb.647948447.couch | +-- 80000000-ffffffff/ | | -- emaildb.124456678.couch |___|____-- testdb.647948447.couch

More infos: http://docs.couchdb.org/en/latest/cluster/sharding.html

  • In a nutshell, the sharding and cluster features allow you to have a distributed database with distributed map/reduce computation. In the above example, each dbs has 2 shards, which means each database spans over two files. Every new doc created can end up in one of those two. The disk usage won't be evenly distributed though. For example, if every doc is a small json doc, but one of them gets a 1GB attachment (http://docs.couchdb.org/en/latest/intro/api.html#attachments), only one shard will get a 1GB bump. The sharding is doc based. You can have 2 shards, you can have 20, and they don't all have to be on the same server (http://docs.couchdb.org/en/latest/cluster/theory.html). If you know that one server won't have enough disk space to hold all your data, you can set up 20 couchdb servers that will each hold 1 shard (around 1/20 of all the docs). Whether it's a single node in a basement, or a cluster of couchdb servers all over the world, for the client app (curl, pouchdb, firefox, etc), it's the same api.

  • The _dbs database (_dbs.couch) records informations for each dbs for cluster and shards management. Its size increases because each time you create and delete a database, it gets updated (Copy-On-Write). From CouchDB 2.1.0 and beyond, it will auto-compact. You can check the auto-compaction settings in your server's config.(in a browser: http://localhost:5984/_utils/#/_config/, compactions sections). Admin panel is on a different port: http://localhost:5986/_utils

  • The size reported in Fauxton is the "active size". Doesn't count deleted docs still on disk that will be deleted after compaction. curl http://localhost:5984/testdb will give additional informations, like the size on disk (http://docs.couchdb.org/en/latest/api/database/common.html#get--db).

like image 184
M-I Avatar answered Oct 05 '22 01:10

M-I