I understand that document-oriented NoSQL DBs are "extensions" of the KV model in that they allow you to query more than just a single lookup key. But once something is a "document", I feel like it already has a relational model baked into it:
"myJson": {
"fizz": 4,
"buzz": "true",
"widget" : {
...etc.
}
}
To me, I don't see the difference between this JSON, and a json_objects
table with a fizz
and buzz
field, and a foreign key relationship to a second widgets
table.
And "columnar" DB's like Cassandra just sound like straight-up relational/table DBs.
So I ask: what is so different about document- and column-oriented DBs, and so distinguishing (from RDBMSes) about them? What problems are they best suited to solve that render them superior to relational DBs under certain circumstances? Thanks in advance!
Columnar Database vs Relational Database While a relational database is optimized for storing rows of data, typically for transactional applications, a columnar database is optimized for fast retrieval of columns of data, typically in analytical applications.
In simple terms, a relational (SQL) database stores structured data that conforms to a predefined schema. A document-model (or NoSQL) database stores documents that may contain unstructured data without a schema. At least, that's the way that these two database storage models have traditionally been discussed.
A document database is a type of non-relational database that is designed to store and query data as JSON-like documents.
Relational databases have a set schema and they function as tables of rows and columns. Wide-column databases have a similar, but different schema. They also have rows and columns. However, they are not fixed within a table, but have a dynamic schema.
Firstly I'd like to say that you are very correct in saying that NoSql is different from Relational Databases and so its hard to make a comparison. With that being said there are many big distinctions between the two that can be compared.
Scaling
Although you can shard a MySql database there are issues with sharding and enforcing ACID properties when a RDMS is on multiple machines will be very challenging, NoSql solutions like Cassandra are famous for their ability to grow without problems with some cases managing 400 nodes in a cluster without a problem. Not only is it easy to grow a Cassandra database, but performance does not take a hit.
Schema(less) model.
NoSQL database systems are developed to manage large volumes of data that don't follow a fixed schema. This means that for example you wish to add a new column to an existing column family in Cassandra you don't need to go back and amend the column family so no need for this:
ALTER TABLE table_name ALTER COLUMN column_name datatype;
We can instead just add new columns as we go, and might end up with the following 'table':
key | follower1 | follower2 | follower2
-------------+------------+-------------+-----------
lyubent | joeb | chuckn | gordonf
chuckn | joeb | gordonf
gordonf | chuckn
joeb | chuckn | lyubent | joeb
This allows data models to be flexible and easily extended but in doing so data becomes less structured.
Speed
NoSql databases are optimized for high write speeds while the RDBMs' aim for high read speeds. But even with that in mind NoSql solutions still tend to outperform RDBMs systems when it comes to reads. This is because the NoSql databases don't implement many of the functions that slow down read/write/update operations in the Relational Model like for example ACID properties and transactions.
When should it be used?
Just for a visual illustration, this helped me out a lot in understanding where the different sql solutions fit into the database world and how each fits a purpose.
In no schema db you don't have fixed columns and types.
For example product 'Jeans' can have attributes 'price', 'length' and 'model' (M/W) but for product book you have attributes 'price', 'authors' and 'title'. For mobile phones you will have 'screen type', 'operating system' etc.
It is very difficult to model that in RDBMS because you are not flexible and user cannot insert arbitrary attributes so it is easier to use a document database which are optimized for this kind of data so that you can easily search and filter by value on arbitrary attributes (eg. all products with length>30 and model=w).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With