Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Relational vs Columnar and Document Databases - aren't they one in the same?

I understand that document-oriented NoSQL DBs are "extensions" of the KV model in that they allow you to query more than just a single lookup key. But once something is a "document", I feel like it already has a relational model baked into it:

"myJson": {
    "fizz": 4,
    "buzz": "true",
    "widget" : {
        ...etc.
    }
}

To me, I don't see the difference between this JSON, and a json_objects table with a fizz and buzz field, and a foreign key relationship to a second widgets table.

And "columnar" DB's like Cassandra just sound like straight-up relational/table DBs.

So I ask: what is so different about document- and column-oriented DBs, and so distinguishing (from RDBMSes) about them? What problems are they best suited to solve that render them superior to relational DBs under certain circumstances? Thanks in advance!

like image 876
IAmYourFaja Avatar asked Mar 08 '13 21:03

IAmYourFaja


People also ask

What is the difference between relational and columnar database?

Columnar Database vs Relational Database While a relational database is optimized for storing rows of data, typically for transactional applications, a columnar database is optimized for fast retrieval of columns of data, typically in analytical applications.

What is the difference between document database and relational database?

In simple terms, a relational (SQL) database stores structured data that conforms to a predefined schema. A document-model (or NoSQL) database stores documents that may contain unstructured data without a schema. At least, that's the way that these two database storage models have traditionally been discussed.

Is document DB relational?

A document database is a type of non-relational database that is designed to store and query data as JSON-like documents.

How are wide-column databases similar to relational databases?

Relational databases have a set schema and they function as tables of rows and columns. Wide-column databases have a similar, but different schema. They also have rows and columns. However, they are not fixed within a table, but have a dynamic schema.


2 Answers

Firstly I'd like to say that you are very correct in saying that NoSql is different from Relational Databases and so its hard to make a comparison. With that being said there are many big distinctions between the two that can be compared.

Scaling
Although you can shard a MySql database there are issues with sharding and enforcing ACID properties when a RDMS is on multiple machines will be very challenging, NoSql solutions like Cassandra are famous for their ability to grow without problems with some cases managing 400 nodes in a cluster without a problem. Not only is it easy to grow a Cassandra database, but performance does not take a hit.

Schema(less) model.
NoSQL database systems are developed to manage large volumes of data that don't follow a fixed schema. This means that for example you wish to add a new column to an existing column family in Cassandra you don't need to go back and amend the column family so no need for this:

ALTER TABLE table_name ALTER COLUMN column_name datatype;

We can instead just add new columns as we go, and might end up with the following 'table':

 key         | follower1  | follower2   | follower2          
-------------+------------+-------------+-----------
 lyubent     | joeb       | chuckn      | gordonf     
 chuckn      | joeb       | gordonf                   
 gordonf     | chuckn                                 
 joeb        | chuckn     | lyubent     | joeb        

This allows data models to be flexible and easily extended but in doing so data becomes less structured.

Speed
NoSql databases are optimized for high write speeds while the RDBMs' aim for high read speeds. But even with that in mind NoSql solutions still tend to outperform RDBMs systems when it comes to reads. This is because the NoSql databases don't implement many of the functions that slow down read/write/update operations in the Relational Model like for example ACID properties and transactions.

When should it be used?

  • Your application/website will need to grow rapidly but you want to start off small.
  • You're more concerned with writing data than reading it back. (Lots of tweets are posted but not all of them are read)
  • Availability of your system is more important that data being 100% updated. (So if you are a bank, you don't want NoSql but if you are a website that needs 100% uptime it could be a good choice)
  • If the data being written needs to succeed 100% of the time, but eventual consistency isn't a problem.

Just for a visual illustration, this helped me out a lot in understanding where the different sql solutions fit into the database world and how each fits a purpose.

Database Triad - Availability, Consistency and Partition Tolerance

like image 156
Lyuben Todorov Avatar answered Nov 03 '22 19:11

Lyuben Todorov


In no schema db you don't have fixed columns and types.

For example product 'Jeans' can have attributes 'price', 'length' and 'model' (M/W) but for product book you have attributes 'price', 'authors' and 'title'. For mobile phones you will have 'screen type', 'operating system' etc.

It is very difficult to model that in RDBMS because you are not flexible and user cannot insert arbitrary attributes so it is easier to use a document database which are optimized for this kind of data so that you can easily search and filter by value on arbitrary attributes (eg. all products with length>30 and model=w).

like image 35
user1944408 Avatar answered Nov 03 '22 17:11

user1944408