I have been reading articles around the net to understand the differences between the following key
types. But it just seems hard for me to grasp. Examples will definitely help make understanding better.
primary key, partition key, composite key clustering key
A partition key can have a partition key defined with multiple table columns which determines which node stores the data. For a table with a composite partition key, Cassandra uses multiple columns as the partition key. These columns form logical sets inside a partition to facilitate retrieval.
The partition key is responsible for distributing data among nodes. A partition key is the same as the primary key when the primary key consists of a single column. Partition keys belong to a node. Cassandra is organized into a cluster of nodes, with each node having an equal part of the partition key hashes.
The clustering key provides the sort order of the data stored within a partition. All of these keys also uniquely identify the data. We also touched upon the Cassandra architecture and data modeling topics. For more information on Cassandra, visit the DataStax and Apache Cassandra documentation.
A cluster key is a column that is specified as the key for storing rows in ascending or descending order of the specified column values. If a cluster key is specified for one or more columns in a table, the table rows can be stored in ascending or descending order of the values in the cluster key column(s).
There is a lot of confusion around this, I will try to make it as simple as possible.
The primary key is a general concept to indicate one or more columns used to retrieve data from a Table.
The primary key may be SIMPLE and even declared inline:
create table stackoverflow_simple ( key text PRIMARY KEY, data text );
That means that it is made by a single column.
But the primary key can also be COMPOSITE (aka COMPOUND), generated from more columns.
create table stackoverflow_composite ( key_part_one text, key_part_two int, data text, PRIMARY KEY(key_part_one, key_part_two) );
In a situation of COMPOSITE primary key, the "first part" of the key is called PARTITION KEY (in this example key_part_one is the partition key) and the second part of the key is the CLUSTERING KEY (in this example key_part_two)
Please note that both partition and clustering key can be made by more columns, here's how:
create table stackoverflow_multiple ( k_part_one text, k_part_two int, k_clust_one text, k_clust_two int, k_clust_three uuid, data text, PRIMARY KEY((k_part_one, k_part_two), k_clust_one, k_clust_two, k_clust_three) );
Behind these names ...
Further usage information: DATASTAX DOCUMENTATION
insert into stackoverflow_simple (key, data) VALUES ('han', 'solo'); select * from stackoverflow_simple where key='han';
table content
key | data ----+------ han | solo
COMPOSITE/COMPOUND KEY can retrieve "wide rows" (i.e. you can query by just the partition key, even if you have clustering keys defined)
insert into stackoverflow_composite (key_part_one, key_part_two, data) VALUES ('ronaldo', 9, 'football player'); insert into stackoverflow_composite (key_part_one, key_part_two, data) VALUES ('ronaldo', 10, 'ex-football player'); select * from stackoverflow_composite where key_part_one = 'ronaldo';
table content
key_part_one | key_part_two | data --------------+--------------+-------------------- ronaldo | 9 | football player ronaldo | 10 | ex-football player
But you can query with all keys (both partition and clustering) ...
select * from stackoverflow_composite where key_part_one = 'ronaldo' and key_part_two = 10;
query output
key_part_one | key_part_two | data --------------+--------------+-------------------- ronaldo | 10 | ex-football player
Important note: the partition key is the minimum-specifier needed to perform a query using a where clause
. If you have a composite partition key, like the following
eg: PRIMARY KEY((col1, col2), col10, col4))
You can perform query only by passing at least both col1 and col2, these are the 2 columns that define the partition key. The "general" rule to make query is you must pass at least all partition key columns, then you can add optionally each clustering key in the order they're set.
so, the valid queries are (excluding secondary indexes)
Invalid:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With