Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why should I use document based database instead of relational database?

People also ask

When should you use a document database vs relational?

If you're working with lots of unorganized data then a document database might suit you better, if your data is more structured and you application needs to access specific information and how it related to other data-points then a relational database is a better fit.

Why would you use a document database?

Document databases make it easier for developers to store and query data in a database by using the same document-model format they use in their application code. The flexible, semistructured, and hierarchical nature of documents and document databases allows them to evolve with applications' needs.

How does a document database differ from a relational database?

Relational databases generally store data in separate tables that are defined by the programmer, and a single object may be spread across several tables. Document databases store all information for a given object in a single instance in the database, and every stored object can be different from every other.


Probably you shouldn't :-)

The second most obvious answer is you should use it if your data isn't relational. This usually manifests itself in having no easy way to describe your data as a set of columns. A good example is a database where you actually store paper documents, e.g. by scanning office mail. The data is the scanned PDF and you have some meta data which always exists (scanned at, scanned by, type of document) and lots of possible metadata fields which exists sometime (customer number, supplier number, order number, keep on file until, OCRed fulltext, etc). Usually you do not know in advance which metadata fields you will add within the next two years. Things like CouchDB work much nicer for that kind of data than relational databases.

I also personally love the fact that I don't need any client libraries for CouchDB except an HTTP client, which is nowadays included in nearly every programming language.

The probably least obvious answer: If you feel no pain using a RDBMS, stay with it. If you always have to work around your RDBMS to get your job done, a document oriented database might be worth a look.

For a more elaborate list check this posting of Richard Jones.


CouchDB (from their website)

  • A document database server, accessible via a RESTful JSON API. Generally, relational databases aren't simply accessed via REST services, but require a much more complex SQL API. Often these API's (JDBC, ODBC, etc.) are quite complex. REST is quite simple.

  • Ad-hoc and schema-free with a flat address space. Relational databases have complex, fixed schema. You define tables, columns, indexes, sequences, views and other stuff. Couch doesn't require this level of complex, expensive, fragile advanced planning.

  • Distributed, featuring robust, incremental replication with bi-directional conflict detection and management. Some SQL commercial products offer this. Because of the SQL API and the fixed schemas, this is complex, difficult and expensive. For Couch, it appears simple and inexpensive.

  • Query-able and index-able, featuring a table oriented reporting engine that uses Javascript as a query language. So does SQL and relational databases. Nothing new here.

So. Why CouchDB?

  • REST is simpler than JDBC or ODBC.
  • No Schema is simpler than Schema.
  • Distributed in a way that appears simple and inexpensive.

For stupidly storing and serving other-servers-data.

In the last couple of weeks I've been playing with a lifestream app that polls my feeds (delicious, flickr, github, twitter...) and stores them in couchdb. The beauty of couchdb is that it lets me keep the original data in its original structure with no overhead. I added a 'class' field to each document, storing the source server, and wrote a javascript render class for each source.

Generalizing, whenever your server communicates with another server a schema-less storage is best as you have no control over the schema. As a bonus, couchdb uses the native protocols of servers and clients - JSON for representation and HTTP REST for transport.


Rapid application development comes to mind.

When I am constantly evolving my schema, I am constantly frustrated by having to maintain the schema in MySQL/SQLite. While I've not done too much with CouchDB yet, I do like how simple it is to evolve the schema during the RAD process.

A case where you might not want to use a non-relational database is when you have a lot of many-to-many relationships; I've yet to get my head around how to create good MapReduce functions around these kinds of relationships, particularly if you need to have metadata in the joining relationship. I'm not sure, but I don't think CouchDB Map functions can call their own queries on the database, since that could potentially cause infinite loops.