Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra asks for ALLOW FILTERING even though column is clustering key

Tags:

cassandra

Very new to Cassandra so apologies if the question is simple.

I created a table:

create table ApiLog (
LogId uuid,     
DateCreated timestamp,
ClientIpAddress varchar,
primary key (LogId, DateCreated));

This work fine:

select * from apilog

If I try to add a where clause with the DateCreated like this:

select * from apilog where datecreated <= '2016-07-14'

I get this:

Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING

From other questions here on SO and from the tutorials on datastax it is my understanding that since the datecreated column is a clustering key it can be used to filter data.

I also tried to create an index but I get the same message back. And I tried to remove the DateCreated from the primary key and have it only as an index and I still get the same back:

create index ApiLog_DateCreated on dotnetdemo.apilog (datecreated);
like image 757
Jason Avatar asked Jul 13 '16 11:07

Jason


People also ask

When we use allow filtering in Cassandra?

SELECT * FROM blogs WHERE author='Jonathan Ellis' and time2 = 1418306451235; Cassandra will request ALLOW FILTERING as it will have to first find and load the rows containing Jonathan as author, and then to filter out the ones which do not have a time2 column equal to the specified value.

What is the use of clustering key in Cassandra?

The clustering key provides the sort order of the data stored within a partition. All of these keys also uniquely identify the data. We also touched upon the Cassandra architecture and data modeling topics. For more information on Cassandra, visit the DataStax and Apache Cassandra documentation.

What is partition key in Cassandra?

The Cassandra partition key's primary goal is to query data efficiently and evenly distribute data across a cluster. It is always the first value in the definition of the primary key. A composite partition key is used to combine more than one column value to form a single partition key.


1 Answers

The partition key LogId determines on which node each partition will be stored. So if you don't specify the partition key, then Cassandra has to filter all the partitions of this table on all the nodes to find matching data. That's why you have to say ALLOW FILTERING, since that operation is very inefficient and is discouraged.

If you specify a specific LogId, then Cassandra can find the partition on a single node and efficiently do a range query by the clustering key.

So you need to plan your schema such that you can do your range queries within a single partition and not have to do a full table scan like you're trying to do.

like image 187
Jim Meyer Avatar answered Sep 30 '22 05:09

Jim Meyer