Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cassandra Select on indexed columns and with IN clause for the PRIMARY KEY are not supported

In Cassandra, I'm using the cql:

select msg from log where id in ('A', 'B') and filter1 = 'filter' 

(where id is the partition key and filter1 is a secondary index and filter1 cannot be used as a cluster column)

This gives the response:

Select on indexed columns and with IN clause for the PRIMARY KEY are not supported

How can I change CQL to prevent this?

like image 439
Dev Zhou Avatar asked Sep 14 '15 06:09

Dev Zhou


People also ask

Can we use in clause in Cassandra?

Using the SELECT command with the IN keyword. The IN keyword can define a set of clustering columns to fetch together, supporting a "multi-get" of CQL rows. A single clustering column can be defined if all preceding columns are defined for either equality or group inclusion.

Does Cassandra support secondary index?

4. Secondary Indexes. Secondary Indexes in Cassandra solve the need for querying columns that are not part of the primary key. When we insert data, Cassandra uses an append-only file called commitlog for storing the changes, so writes are quick.

Is primary key mandatory in Cassandra?

You can't create a table in Cassandra without a primary key, But still if you want to save your data you can add an additional column to your table (let say "pk") with data type UUID.

Which of the following are not allowed in a CQL query?

CQL does not support wildcard queries. CQL does not support Union, Intersection queries. Table columns cannot be filtered without creating the index. Greater than (>) and less than (<) query is only supported on clustering column.


2 Answers

You would need to split that up into separate queries of:

select msg from log where id = 'A' and filter1 = 'filter';

and

select msg from log where id = 'B' and filter1 = 'filter';

Due to the way data is partitioned in Cassandra, CQL has a lot of seemingly arbitrary restrictions (to discourage inefficient queries and also because they are complex to implement).

Over time I think these restrictions will slowly be removed, but for now we have to work around them. For more details on the restrictions, see A deep look at the CQL where clause.

like image 165
Jim Meyer Avatar answered Sep 26 '22 02:09

Jim Meyer


Another option, is that you could build a table specifically for this query (a query table) with filter1 as a partition key and id as a clustering key. That way, your query works and you avoid having a secondary index all-together.

aploetz@cqlsh:stackoverflow> CREATE TABLE log 
    (filter1 text, 
          id text, 
         msg text, 
     PRIMARY KEY (filter1, id));
aploetz@cqlsh:stackoverflow> INSERT INTO log (filter1, id, msg) 
                             VALUES ('filter','A','message A');
aploetz@cqlsh:stackoverflow> INSERT INTO log (filter1, id, msg)
                             VALUES ('filter','B','message B');
aploetz@cqlsh:stackoverflow> INSERT INTO log (filter1, id, msg) 
                             VALUES ('filter','C','message C');
aploetz@cqlsh:stackoverflow> SELECT msg FROM log 
                             WHERE filter1='filter' AND id IN ('A','B');

 msg
-----------
 message A
 message B

(2 rows)

You would still be using an "IN" which isn't known to perform well either. But you would also be specifying a partition key, so it might perform better than expected.

like image 45
Aaron Avatar answered Sep 25 '22 02:09

Aaron