I just watched this youtube video of Patrick McFadin on cassandra datamodelling.
There was one table, as follows:
create table user_activity_history {
username varchar,
interaction_date varchar,
activity_code varchar,
detail varchar,
PRIMARY KEY((username,interaction_date),interaction_time)
);
Why is the primary key ((username,interaction_date),interaction_time)
.
How is that different from (username,interaction_date,interaction_time)
.
Cassandra allows you to use multiple columns as the partition key for a table with a composite partition key. Unlike a simple partition key, a composite partition key is used when the data stored is too large to reside in a single partition and determines where data will reside with multiple columns.
The clustering key provides the sort order of the data stored within a partition. All of these keys also uniquely identify the data. We also touched upon the Cassandra architecture and data modeling topics. For more information on Cassandra, visit the DataStax and Apache Cassandra documentation.
The Partition Key is responsible for data distribution across your nodes. The Clustering Key is responsible for data sorting within the partition. The Primary Key is equivalent to the Partition Key in a single-field-key table (i.e. Simple).
The difference is related to the table's partition_key
. Typically the first element in a PRIMARY KEY is also the partition key - this defines the physical location of the data in the cluster, e.g., by using the following:
PRIMARY KEY(username,interaction_date,interaction_time)
data inserted into the table will be partitioned (located physically) according to username
, whereas by using the following:
PRIMARY KEY((username,interaction_date),interaction_time)
it will be partitioned according to the username,interaction_date
combination. The advantage of the latter scheme is that data relating to a single username
can be stored across nodes in the cluster.
There is more details on partition_keys in datastax's CQL documentation on CREATE TABLE:
When you use a compound PRIMARY KEY Cassandra treats the first column declared in a definition as the partition key and stores all columns of the row on the same physical node. When you use a composite partition key, Cassandra treats the columns in nested parentheses as partition keys and stores columns of a row on more than one node. You declare a composite partition key using an extra set of parentheses to define which columns partition the data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With