Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra - querying on clustering keys

I am just getting start on Cassandra and I was trying to create tables with different partition and clustering keys to see how they can be queried differently.

I created a table with primary key of the form - (a),b,c where a is the partition key and b,c are clustering key.

When querying I noticed that the following query:

select * from tablename where b=val;

results in:

Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING

And using "ALLOW FILTERING" gets me what I want (even though I've heard its bad for performance).

But when I run the following query:

select * from tablename where c=val;

It says:

PRIMARY KEY column "c" cannot be restricted (preceding column "b" is either not restricted or by a non-EQ relation)

And there is no "ALLOW FILTERING" option at all.

MY QUESTION IS - Why are all clustering keys not treated the same? column b which is adjacent to the partition key 'a' is given an option of 'allow filtering' which allows querying on it while querying on column 'c' does not seem possible at all (given the way this table is laid out).

ALLOW FILTERING gets cassandra to scan through all SSTables and get the data out of it when the partition key is missing, then why cant we do the same column c?

like image 220
user3376961 Avatar asked May 27 '15 15:05

user3376961


People also ask

How do I choose a clustering key in Cassandra?

You must specify the sort order for each of the clustering keys in the ORDER BY statement. The partition key is not part of the ORDER BY statement because its values are hashed and therefore won't be close to each other in the cluster. Composite keys are partition keys that consist of multiple columns.

How does Cassandra cluster key work?

In this article, we learned that Cassandra uses a partition key or a composite partition key to determine the placement of the data in a cluster. The clustering key provides the sort order of the data stored within a partition. All of these keys also uniquely identify the data.

Is clustering key unique in Cassandra?

Single partition key (without clustering key) is primary key which has to be unique. A partition key + clustering key has to be unique but it doesn't mean that either partition key or a clustering key has to be unique alone.

What is the difference between partition key and clustering key in Cassandra?

A partition key is the primary lookup to find a set of rows, i.e. a partition. A clustering key is the part of the primary key that isn't the partition key (and defines the ordering within a partition).


1 Answers

It's not that clustering keys are not treated the same, it's that you can't skip them. This is because Cassandra uses the clustering keys to determine on-disk sort order within a partition.

To add to your example, assume PRIMARY KEY ((a),b,c,d). You could run your query (with ALLOW FILTERING) by specifying just b, or b and c. But it wouldn't allow you to specify c and d (skipping b) or b and d (skipping c).

And as a side node, if you really want to be able to query by only b or only c, then you should support those queries with additional tables designed as such. ALLOW FILTERING is a band-aid, and is not something you should ever do in a production Cassandra deployment.

like image 63
Aaron Avatar answered Sep 30 '22 04:09

Aaron