A few days ago I read about wide-column stored type of NoSQL and exclusively Apache-Cassandra. What I understand is that Cassandra consist of: A keyspace(like database in relational databases) and supporting many column families or tables (Same as table in relational databases) and unlimited rows. From Stackoverflow tags: <blockquote> A wide column store is a type of key-value database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. </blockquote> In Cassandra all of the rows (in a table) should have a row key then each row key can have multiple columns. I read about differences in implementation and storing data of Relational database and NoSQL (Cassandra). But I don't understand the difference between structure: Imagine a scenario which I have a table (or column family in Cassandra): When I execute a query (CQL) like this : <pre class="prettyprint"><code>select * from users; </code></pre> It gives me the result as you can see : <pre class="prettyprint"><code>lastname | age | city | email ----------+------+---------------+---------------------- Doe | 36 | Beverly Hills | janedoe@email.com Jones | 35 | Austin | bob@example.com Byrne | 24 | San Diego | robbyrne@email.com Smith | 46 | Sacramento | null Jones2 | null | Austin | bob@example.com </code></pre> So I perform the above scenario in relational database (MS SQL) with the following query: <pre class="prettyprint"><code>select * from [users] </code></pre> And the result is: <pre class="prettyprint"><code>lastname | age | city | email ----------+------+---------------+---------------------- Doe | 36 | Beverly Hills | janedoe@email.com Jones | 35 | Austin | bob@example.com Byrne | 24 | San Diego | robbyrne@email.com Smith | 46 | Sacramento | NULL Jones2 | NULL | Austin | bob@example.com </code></pre> I know that Cassandra supports dynamic column and I can perform this by using sth like: <pre class="prettyprint"><code>ALTER TABLE users ADD website varchar; </code></pre> But it is available in relational model for example in mssql the above code can be implemented too. Something like: <pre class="prettyprint"><code>ALTER TABLE users ADD website varchar(MAX); </code></pre> What I see is that the first select and second select result is the same. In Cassandra , they just give a row key (lastname) as a standalone object but it is same as a unique field (like ID or a text) in mssql (and all relational databases) and I see the type of column in Cassandra is static (in my example <code>varchar</code>) unlike what it describes in Stackoverflow tag. So my questions is: <ol> <li> Is there any misunderstanding in my imagination about Cassandra?! </li> <li> So what is different between two structure ?! I show you the result is same. </li> <li> Is there any special scenarios (JSON like) that cannot be implemented in relational databases but Cassandra supports? (For example I know that nested column doesn't support in Cassandra.) </li> </ol> Thank you for reading.

We have to look at more complex example to see the differences :) For start: <ul> <li>column family term was used in older Thrift API </li> <li>in newer CQL API, the term table is used</li> </ul> Table is defined as "two-dimensional view of a multi-dimensional column family". The term "wide-rows" was related mainly to the Thrift API. In cql it is defined a bit differently, but underneath looks the same. Comparing SQL and CQL. In SQL table is a set of rows. In simple example it looks like in CQL it is the same, but it is not. CQL table is a set of partitions, where each partition can be just a single row (e.g. when you don't have a clustering key) or multiple rows. Partition containing multiple rows is in Thrift therminology named "wide-row". To see how it is stored underneath, please read e.g. part about composite-keys from here. There are more differences: <ul> <li>CQL can have static columns which are stored on partition level - it seems that every row in partition have a common value, but really it is a single value stored on upper level. It can be used also to model 1:N relations</li> <li>In CQL you can have collection type columns - set, list, map</li> <li>Column can contain a user defined type (you can define e.g. <code>address</code> as type, and reuse this type in many places), or collection can be a collection of user defined types</li> <li>But also CQL does not support JOINs which are available in SQL, and you have to structure your tables very carefully, since they have to be strictly query oriented (in cassandra you can't query data by any column value, secondary indexes also have many limitations). It is usually said that in relational model you model tables clearly basing on data, when in cassandra you model basing on queries.</li> </ul> I hope I was able to make it a bit more clear for you. I recommend watching some vidoes (or reading slides) from Datastax Core Concepts Course as solid introduction to Cassandra.

In my experience CQL misleads a lot of people. First of all you would never want to do: <pre class="prettyprint"><code>SELECT * FROM a_table_here; </code></pre> On a production Cassandra cluster, since you are putting a huge load on your Coordinator node to aggregate all of the data from all of the other nodes. Also by default, you will be given back a maximum of 10000 "rows". To understand how Cassandra stores your data, we need to establish a few terms first: There's the Primary Key, in your case <code>lastname</code>, this is hashed to determine which node in the cluster owns this range, and it's stored there (plus any replica nodes). Next there's Cluster Columns, I don't know if you have any in your example, but you define them like <code>PRIMARY KEY ((lastname),age, city)</code>. In that example you are clustering by age first then city, this is ORDERED. Now for a simplistic high-level view of Cassandra for your use case, it stores the data as a Map to an ordered Multimap: <code>Doe -> 36:Beverly Hills -> janedoe@email.com</code> Where 'Doe' is the Primary Key, which tells you which node(s) have that row of data. And <code>36:Beverly Hills</code> is the Ordered Clustering Keys (part of the ordered multimap key). Lastly janedoe@email.com is the final value (can be multiple mind you) for the Map to a Multimap. There's a lot of nuisances that I left out to make the example simple, for a more in-depth I would highly suggest reading: http://www.planetcassandra.org/making-the-change-from-thrift-to-cql/

Comparing Cassandra structure with Relational Databases

Tags:

A few days ago I read about wide-column stored type of NoSQL and exclusively Apache-Cassandra.

What I understand is that Cassandra consist of:

A keyspace(like database in relational databases) and supporting many column families or tables (Same as table in relational databases) and unlimited rows.

From Stackoverflow tags:

A wide column store is a type of key-value database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table.

In Cassandra all of the rows (in a table) should have a row key then each row key can have multiple columns. I read about differences in implementation and storing data of Relational database and NoSQL (Cassandra).

But I don't understand the difference between structure:

Imagine a scenario which I have a table (or column family in Cassandra):

When I execute a query (CQL) like this :

select * from users;

It gives me the result as you can see :

lastname  | age  | city          | email               
----------+------+---------------+----------------------
      Doe |   36 | Beverly Hills | [email protected]       
    Jones |   35 |        Austin | [email protected]        
    Byrne |   24 |     San Diego | [email protected]         
    Smith |   46 |    Sacramento | null                    
   Jones2 | null |        Austin | [email protected]

So I perform the above scenario in relational database (MS SQL) with the following query:

select * from [users]

And the result is:

lastname  | age  | city          | email               
----------+------+---------------+----------------------
      Doe |   36 | Beverly Hills | [email protected]       
    Jones |   35 |        Austin | [email protected]        
    Byrne |   24 |     San Diego | [email protected]         
    Smith |   46 |    Sacramento | NULL                    
   Jones2 | NULL |        Austin | [email protected]

I know that Cassandra supports dynamic column and I can perform this by using sth like:

ALTER TABLE users ADD website varchar;

But it is available in relational model for example in mssql the above code can be implemented too. Something like:

ALTER TABLE users ADD website varchar(MAX);

What I see is that the first select and second select result is the same. In Cassandra , they just give a row key (lastname) as a standalone object but it is same as a unique field (like ID or a text) in mssql (and all relational databases) and I see the type of column in Cassandra is static (in my example varchar) unlike what it describes in Stackoverflow tag.

So my questions is:

Is there any misunderstanding in my imagination about Cassandra?!
So what is different between two structure ?! I show you the result is same.
Is there any special scenarios (JSON like) that cannot be implemented in relational databases but Cassandra supports? (For example I know that nested column doesn't support in Cassandra.)

Thank you for reading.

958

asked Mar 24 '16 21:03

Mohammad Sina Karvandi

Video Answer

2 Answers

We have to look at more complex example to see the differences :)

For start:

column family term was used in older Thrift API
in newer CQL API, the term table is used

Table is defined as "two-dimensional view of a multi-dimensional column family".

The term "wide-rows" was related mainly to the Thrift API. In cql it is defined a bit differently, but underneath looks the same.

Comparing SQL and CQL. In SQL table is a set of rows. In simple example it looks like in CQL it is the same, but it is not. CQL table is a set of partitions, where each partition can be just a single row (e.g. when you don't have a clustering key) or multiple rows. Partition containing multiple rows is in Thrift therminology named "wide-row". To see how it is stored underneath, please read e.g. part about composite-keys from here.

There are more differences:

CQL can have static columns which are stored on partition level - it seems that every row in partition have a common value, but really it is a single value stored on upper level. It can be used also to model 1:N relations
In CQL you can have collection type columns - set, list, map
Column can contain a user defined type (you can define e.g. address as type, and reuse this type in many places), or collection can be a collection of user defined types
But also CQL does not support JOINs which are available in SQL, and you have to structure your tables very carefully, since they have to be strictly query oriented (in cassandra you can't query data by any column value, secondary indexes also have many limitations). It is usually said that in relational model you model tables clearly basing on data, when in cassandra you model basing on queries.

I hope I was able to make it a bit more clear for you. I recommend watching some vidoes (or reading slides) from Datastax Core Concepts Course as solid introduction to Cassandra.

132

answered Nov 04 '22 23:11

mmatloka

In my experience CQL misleads a lot of people. First of all you would never want to do:

SELECT * FROM a_table_here;

On a production Cassandra cluster, since you are putting a huge load on your Coordinator node to aggregate all of the data from all of the other nodes. Also by default, you will be given back a maximum of 10000 "rows".

To understand how Cassandra stores your data, we need to establish a few terms first:

There's the Primary Key, in your case lastname, this is hashed to determine which node in the cluster owns this range, and it's stored there (plus any replica nodes).

Next there's Cluster Columns, I don't know if you have any in your example, but you define them like PRIMARY KEY ((lastname),age, city). In that example you are clustering by age first then city, this is ORDERED.

Now for a simplistic high-level view of Cassandra for your use case, it stores the data as a Map to an ordered Multimap:

Doe -> 36:Beverly Hills -> [email protected]

Where 'Doe' is the Primary Key, which tells you which node(s) have that row of data. And 36:Beverly Hills is the Ordered Clustering Keys (part of the ordered multimap key). Lastly [email protected] is the final value (can be multiple mind you) for the Map to a Multimap.

There's a lot of nuisances that I left out to make the example simple, for a more in-depth I would highly suggest reading: http://www.planetcassandra.org/making-the-change-from-thrift-to-cql/

answered Nov 04 '22 23:11

fromanator

Related questions
                            
                                Why set VISIBLE=NOW in /etc/profile?
                            
                                Obfuscation in Xamarin Projects
                            
                                BCrypt performance deterioration
                            
                                Registering .Net COM DLLs without Admin rights / regasm
                            
                                Git: pull vs. fetch→pull [duplicate]
                            
                                How can you bundle Angular 2 using System JS Builder?
                            
                                Is FCM (firebase cloud messaging) Token for one device or for one account?
                            
                                How to configure ExternalProject during main project configuration?
                            
                                How to get the current Windows user with ASP.NET Core RC2 MVC6 and IIS7
                            
                                Angular 2 contenteditable
                            
                                Angular 2 AuthGuard + Firebase Auth
                            
                                Graceful shutdown of GenServer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With