Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Cassandra Composite keys

Tags:

cassandra

cql

I just watched this youtube video of Patrick McFadin on cassandra datamodelling.

There was one table, as follows:

create table user_activity_history {
  username varchar,
  interaction_date varchar,
  activity_code varchar,
  detail varchar,
  PRIMARY KEY((username,interaction_date),interaction_time)
);

Why is the primary key ((username,interaction_date),interaction_time). How is that different from (username,interaction_date,interaction_time).

like image 502
S Kr Avatar asked Oct 24 '13 04:10

S Kr


People also ask

Can Cassandra have multiple partition keys?

Cassandra allows you to use multiple columns as the partition key for a table with a composite partition key. Unlike a simple partition key, a composite partition key is used when the data stored is too large to reside in a single partition and determines where data will reside with multiple columns.

How does Cassandra cluster key work?

The clustering key provides the sort order of the data stored within a partition. All of these keys also uniquely identify the data. We also touched upon the Cassandra architecture and data modeling topics. For more information on Cassandra, visit the DataStax and Apache Cassandra documentation.

What is the need of a partition key in Cassandra?

The Partition Key is responsible for data distribution across your nodes. The Clustering Key is responsible for data sorting within the partition. The Primary Key is equivalent to the Partition Key in a single-field-key table (i.e. Simple).


1 Answers

The difference is related to the table's partition_key. Typically the first element in a PRIMARY KEY is also the partition key - this defines the physical location of the data in the cluster, e.g., by using the following:

PRIMARY KEY(username,interaction_date,interaction_time)

data inserted into the table will be partitioned (located physically) according to username, whereas by using the following:

PRIMARY KEY((username,interaction_date),interaction_time)

it will be partitioned according to the username,interaction_date combination. The advantage of the latter scheme is that data relating to a single username can be stored across nodes in the cluster.

There is more details on partition_keys in datastax's CQL documentation on CREATE TABLE:

When you use a compound PRIMARY KEY Cassandra treats the first column declared in a definition as the partition key and stores all columns of the row on the same physical node. When you use a composite partition key, Cassandra treats the columns in nested parentheses as partition keys and stores columns of a row on more than one node. You declare a composite partition key using an extra set of parentheses to define which columns partition the data.

like image 103
lorcan Avatar answered Oct 04 '22 02:10

lorcan