Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confused on Cassandra terminology (row vs partition)

I am hoping someone can clear my confusion on the difference between a row and partition in Cassandra. I thought a row would be a set of columns(like in a SQL DB), as specified in the schema, distributed across nodes by partition keys and ordered by the clustering key within each partition.

But then I ran into this tutorial: https://academy.datastax.com/demos/getting-started-time-series-data-modeling

Under "Time series Pattern 1", it states:

Since each column is dynamic, our row will grow as needed to accommodate the data.

Why would a row grow? I can see a partition growing but why a row? The picture in that example also makes no sense to me -- I imagine the partition as being a set of rows each having a (WeatherStation|event) columns, where WeatherStationID would be same repeated value for each row in a partition.

I also tried looking at this tutorial: http://www.slideshare.net/yukim/cql3-in-depth, slide 15. enter image description here

My reading is that this shows a single partition with two rows. It seems to me that no matter how much new data you insert, the partition will grow but not the row (short of running "alter table" of course)?

like image 817
Yana K. Avatar asked Nov 12 '15 01:11

Yana K.


People also ask

What is a partition key in Cassandra?

Partition key, clustering key, together they make up the primary key and that is, if you will, a key part of table design in Cassandra. Cassandra uses the first column name as the partition key. CREATE TABLE users ( user_name varchar PRIMARY KEY, password vachar, gender varchar, state varchar,Age int); Given below is the example mentioned:

What is a single column primary key in Cassandra?

A single column Primary Key is also called a Partition Key. When Cassandra is deciding where in the cluster to store this particular piece of data, it will hash the partition key. The value of that hash dictates where the data will reside and which replicas will be responsible for it. Partition Key

What is a clustering column in Cassandra?

It’s common in Cassandra when you see a primary key that has the clustering column component to kind of tack on that UUID on the end just to individuate the row and make sure it refers uniquely to one thing. Clustering keys are things we add to the primary key. That gives the order to that partition of rows.

How does Cassandra decide where to store data?

When Cassandra is deciding where in the cluster to store this particular piece of data, it will hash the partition key. The value of that hash dictates where the data will reside and which replicas will be responsible for it. Partition Key The Partition Key is responsible for the distribution of data amongst the nodes.


1 Answers

You are correct. In the article that you posted he talks of partitions, not rows.

The two terms are still used very loosely to mean each other in many situations. Back in the thrift days, the correct term was row but that changed with the new CQL.

On a unrelated note in Thrift you are able grow a row(not partition) since every row had their own Schema. You can find more information on that here: http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows

like image 199
JayK Avatar answered Oct 05 '22 07:10

JayK