I am just getting start on Cassandra and I was trying to create tables with different partition and clustering keys to see how they can be queried differently. I created a table with primary key of the form - (a),b,c where a is the partition key and b,c are clustering key. When querying I noticed that the following query: <pre class="prettyprint"><code>select * from tablename where b=val; </code></pre> results in: <blockquote> Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING </blockquote> And using "ALLOW FILTERING" gets me what I want (even though I've heard its bad for performance). But when I run the following query: <pre class="prettyprint"><code>select * from tablename where c=val; </code></pre> It says: <blockquote> PRIMARY KEY column "c" cannot be restricted (preceding column "b" is either not restricted or by a non-EQ relation) </blockquote> And there is no "ALLOW FILTERING" option at all. MY QUESTION IS - Why are all clustering keys not treated the same? column b which is adjacent to the partition key 'a' is given an option of 'allow filtering' which allows querying on it while querying on column 'c' does not seem possible at all (given the way this table is laid out). ALLOW FILTERING gets cassandra to scan through all SSTables and get the data out of it when the partition key is missing, then why cant we do the same column c?

It's not that clustering keys are not treated the same, it's that you can't skip them. This is because Cassandra uses the clustering keys to determine on-disk sort order within a partition. To add to your example, assume <code>PRIMARY KEY ((a),b,c,d)</code>. You could run your query (with ALLOW FILTERING) by specifying just <code>b</code>, or <code>b</code> and <code>c</code>. But it wouldn't allow you to specify <code>c</code> and <code>d</code> (skipping <code>b</code>) or <code>b</code> and <code>d</code> (skipping <code>c</code>). And as a side node, if you really want to be able to query by only <code>b</code> or only <code>c</code>, then you should support those queries with additional tables designed as such. ALLOW FILTERING is a band-aid, and is not something you should ever do in a production Cassandra deployment.

Cassandra - querying on clustering keys

Tags:

primary-key

cassandra

cql

clustering-key

I am just getting start on Cassandra and I was trying to create tables with different partition and clustering keys to see how they can be queried differently.

I created a table with primary key of the form - (a),b,c where a is the partition key and b,c are clustering key.

When querying I noticed that the following query:

select * from tablename where b=val;

results in:

Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING

And using "ALLOW FILTERING" gets me what I want (even though I've heard its bad for performance).

But when I run the following query:

select * from tablename where c=val;

It says:

PRIMARY KEY column "c" cannot be restricted (preceding column "b" is either not restricted or by a non-EQ relation)

And there is no "ALLOW FILTERING" option at all.

MY QUESTION IS - Why are all clustering keys not treated the same? column b which is adjacent to the partition key 'a' is given an option of 'allow filtering' which allows querying on it while querying on column 'c' does not seem possible at all (given the way this table is laid out).

ALLOW FILTERING gets cassandra to scan through all SSTables and get the data out of it when the partition key is missing, then why cant we do the same column c?

220

asked May 27 '15 15:05

user3376961

1 Answers

It's not that clustering keys are not treated the same, it's that you can't skip them. This is because Cassandra uses the clustering keys to determine on-disk sort order within a partition.

To add to your example, assume PRIMARY KEY ((a),b,c,d). You could run your query (with ALLOW FILTERING) by specifying just b, or b and c. But it wouldn't allow you to specify c and d (skipping b) or b and d (skipping c).

And as a side node, if you really want to be able to query by only b or only c, then you should support those queries with additional tables designed as such. ALLOW FILTERING is a band-aid, and is not something you should ever do in a production Cassandra deployment.

answered Sep 30 '22 04:09

Aaron

Related questions
                            
                                Is there a Thrift or Cassandra client for Node.js/JavaScript
                            
                                How to know the size of a keyspace and column family in Cassandra?
                            
                                Rails ORM for Cassandra
                            
                                Cassandra PHP module [closed]
                            
                                Cassandra:The stack size specified is too small, Specify at least 228k
                            
                                Can't connect to cassandra node from different host
                            
                                best Cassandra library/wrapper for Python? [closed]
                            
                                Spark-Submit: --packages vs --jars
                            
                                commitLog and SSTables in Cassandra database
                            
                                About Java Cassandra Client, which one is better? How about CQL? [closed]
                            
                                Cassandra: Nodetool showing "?" in Owns
                            
                                How to get the replication factor of C* cluster?
                            
                                Cassandra Wide Vs Skinny Rows for large columns
                            
                                NoSql Referential Data
                            
                                How to handle kafka publishing failure in robust way
                            
                                SELECT in cassandra where id != null
                            
                                Cassandra CQL Select count with LIMIT
                            
                                Cassandra asks for ALLOW FILTERING even though column is clustering key
                            
                                java.lang.NoClassDefFoundError: com/google/common/util/concurrent/FutureFallback
                            
                                Atomic transactions in key-value stores

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With