Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are old data accessible in CouchDB?

Tags:

append

couchdb

I've read a bit about CouchDB and I'm really intrigued by the fact that it's "append-only". I may be misunderstanding that, but as I understand it, it works a bit like this:

  • data is added at time t0 to the DB telling that a user with ID 1's name is "Cedrik Martin"

  • a query asking "what is the name of the user with ID 1?" returns "Cedrik Martin"

  • at time t1 an update is made to the DB telling: "User with ID 1's name is Cedric Martin" (changing the 'k' to a 'c').

  • a query asking again "what is the name of the user with ID 1" now returns "Cedric Martin"

It's a silly example, but it's because I'd like to understand something fundamental about CouchDB.

Seen that the update has been made using an append at the end of the DB, is it possible to query the DB "as it was at time t0", without doing anything special?

Can I ask CouchDB "What was the name of the user with ID 1 at time t0?" ?

EDIT the first answer is very interesting and so I've got a more precise question: as long as I'm not "compacting" a CouchDB, I can write queries that are somehow "referentially transparent" (i.e. they'll always produce the same result)? For example if I query for "document d at revision r", am I guaranteed to always get the same answer back as long as I'm not compacting the DB?

like image 213
Cedric Martin Avatar asked Mar 16 '12 00:03

Cedric Martin


People also ask

Where is CouchDB data stored?

By default, the database files are located under /var/lib/couchdb directory (this location will be specified in the couchdb config file under /etc/couchdb directory). In CouchDB, each database is wholly contained in a single append-only file.

How data is stored in the CouchDB?

CouchDB stores data as "documents", as one or more field/value pairs expressed as JSON. Field values can be simple things like strings, numbers, or dates; but ordered lists and associative arrays can also be used. Every document in a CouchDB database has a unique id and there is no required document schema.

What is difference between CouchDB vs MongoDB?

CouchDB accepts queries via a RESTful HTTP API, while MongoDB uses its own query language. CouchDB prioritizes availability, while MongoDB prioritizes consistency. MongoDB has a much larger user base than CouchDB, making it easier to find support and hire employees for this database solution.

Is CouchDB a distributed database?

CouchDB is a peer-based distributed database system. It allows users and servers to access and update the same shared data while disconnected. Those changes can then be replicated bi-directionally later.


2 Answers

Perhaps the most common mistake made with CouchDB is to believe it provides a versioning system for your data. It does not.

Compaction removes all non-latest revisions of all documents and replication only replicates the latest revisions of any document. If you need historical versions, you must preserve them in your latest revision using any scheme that seems good to you.

"_rev" is, as noted, an unfortunate name, but no other word has been suggested that is any clearer. "_mvcc" and "_mcvv_token" have been suggested before. The issue with both is that any description of what's going on there will inevitably include the "old versions remain on disk until compaction" which will still imply that it's a user versioning system.

To answer the question "Can I ask CouchDB "What was the name of the user with ID 1 at time t0?" ?", the short answer is "NO". The long answer is "YES, but then later it won't work", which is just another way of saying "NO". :)

like image 87
Robert Newson Avatar answered Oct 22 '22 02:10

Robert Newson


As already said, it is technically possible and you shouldn't count on it. It isn't only about compaction, it's also about replication, one of CouchDB's biggest strengths. But yes, if you never compact and if you don't replicate, then you will be able to always fetch all previous versions of all documents. I think it will not work with queries, though, they can't work with older versions.

Basically, calling it "rev" was the biggest mistake in CouchDB's design, it should have been called "mvcc_token" or something like that -- it really only implements MVCC, it isn't meant to be used for versioning.

like image 36
Ladicek Avatar answered Oct 22 '22 01:10

Ladicek