I've read several posts, such as this one, that compare document stores like MongoDb, CouchDb and CouchBase with column family stores like Cassandra.
One comparison is the fact that document stores work at a higher level of granularity as opposed to column family stores that let you work on individual parts of the document. I find that to be simply untrue because Redis supports this via the hset operation and so does mongodb.
Is the argument then, that although both types of solutions allow updating / reading parts of a document, column family stores are simply more efficient at doing this than document stores?
Does that also mean that I should take the document store route for insert and read heavy applications but the column family route for update and read heavy applications?
What are some other differences that would help me choose one solution over the other?
Thanks!
Column-family databases store data in column families as rows that have many columns associated with a row key (Figure 10.1). Column families are groups of related data that is often accessed together. For a Customer, we would often access their Profile information at the same time, but not their Orders.
A column-family data model is not the same as a column-oriented model. A column-family database stores a row with all its column families together, whereas a column-oriented database simply stores data tables by column rather than by row.
The main benefit of a columnar database is faster performance compared to a row-oriented one. That's because it accesses less memory to output data. Because a columnar database stores data by columns instead of rows, it can store more data in a smaller amount of memory.
A document database (also known as a document-oriented database or a document store) is a database that stores information in documents. Document databases offer a variety of advantages, including: An intuitive data model that is fast and easy for developers to work with.
I would suggest that the main difference is in the query model. They can both store similar data structures (you can put a JSON document into a CF store, for example), but document stores typically give you query-by-value capability whereas CF stores typically do not. However the lines are blurring, and it seems that such generalizations are becoming less applicable as each database project matures. Cassandra (a popular CF store), for example, does offer some query-by-value functionality with secondary indexes. However most CF stores require you to write the data the way you intend to read it, meaning you must think about your data model in terms of your queries.
It would seem to me that there are other equally important distinctions between various database technologies, such as consistency model, datacenter replication capability, scaling model, ease of management, caching capabilities, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With