I am hoping someone can clear my confusion on the difference between a row and partition in Cassandra. I thought a row would be a set of columns(like in a SQL DB), as specified in the schema, distributed across nodes by partition keys and ordered by the clustering key within each partition.
But then I ran into this tutorial: https://academy.datastax.com/demos/getting-started-time-series-data-modeling
Under "Time series Pattern 1", it states:
Since each column is dynamic, our row will grow as needed to accommodate the data.
Why would a row grow? I can see a partition growing but why a row? The picture in that example also makes no sense to me -- I imagine the partition as being a set of rows each having a (WeatherStation|event) columns, where WeatherStationID would be same repeated value for each row in a partition.
I also tried looking at this tutorial: http://www.slideshare.net/yukim/cql3-in-depth, slide 15.
My reading is that this shows a single partition with two rows. It seems to me that no matter how much new data you insert, the partition will grow but not the row (short of running "alter table" of course)?
Partition key, clustering key, together they make up the primary key and that is, if you will, a key part of table design in Cassandra. Cassandra uses the first column name as the partition key. CREATE TABLE users ( user_name varchar PRIMARY KEY, password vachar, gender varchar, state varchar,Age int); Given below is the example mentioned:
A single column Primary Key is also called a Partition Key. When Cassandra is deciding where in the cluster to store this particular piece of data, it will hash the partition key. The value of that hash dictates where the data will reside and which replicas will be responsible for it. Partition Key
It’s common in Cassandra when you see a primary key that has the clustering column component to kind of tack on that UUID on the end just to individuate the row and make sure it refers uniquely to one thing. Clustering keys are things we add to the primary key. That gives the order to that partition of rows.
When Cassandra is deciding where in the cluster to store this particular piece of data, it will hash the partition key. The value of that hash dictates where the data will reside and which replicas will be responsible for it. Partition Key The Partition Key is responsible for the distribution of data amongst the nodes.
You are correct. In the article that you posted he talks of partitions, not rows.
The two terms are still used very loosely to mean each other in many situations. Back in the thrift days, the correct term was row but that changed with the new CQL.
On a unrelated note in Thrift you are able grow a row(not partition) since every row had their own Schema. You can find more information on that here: http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With