Taking a simple CouchDB to a theory that CouchDB compaction is totally indifferent to deleted docs.
Deleting a doc from couch via a DELETE method yields the following when trying to retrieve it:
localhost:5984/enq/deleted-doc-id {"error":"not_found","reason":"deleted"}
Expected.
Now I compact the database: localhost:5984/enq/_compact {'ok': true }
And check compaction has finished "compact_running":false
Now I would expect CouchDB to return not_found, reason "missing" on a simple GET localhost:5984/enq/deleted-doc-id {"error":"not_found","reason":"deleted"}
And trying with ?rev=deleted_rev gives me a ful doc, yeah for worthless data.
So am I correct in thinking the couchdb compaction shows no special treatment for deleted docs and simple looks at the rev count again rev limit when deciding what is part of compaction. Is there a special rev_limit we can set for deleted docs?
Surely the only solution can't be a _purge? at the moment we must have thousands of orphaned deleted docs, and whilst we want to maintain some version history for normal docs we dont want to reduce our rev_limit to 1 to assist in this scenario
What are the replication issues we should be aware of with purge?
Deleted documents are preserved forever (because it's essential to providing eventual consistency between replicas). So, the behaviour you described is intentional.
To delete a document as efficiently as possible use the DELETE verb, since this stores only _id, _rev and the deleted flag. You can, of course, achieve the same more manually via POST or PUT.
Finally, _purge exists only for extreme cases where, for example, you've put an important password into a couchdb document and need it be gone from disk. It is not a recommended method for pruning a database, it will typically invalidate any views you have (forcing a full rebuild) and messes with replication too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With