Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra - WHERE clause with non primary key disadvantages

I am new to cassandra and I am using it for analytics tasks (good indexing needed ).

I read in this post (and others): cassandra, select via a non primary key that I can't query my DB with a non-primary key columns with WHERE clause.

To do so, it seems that there is 3 possibilities (ALL with major disadvantages):

  • Create a secondary index (not recommended for performance issues).
  • Create a new table (I don't want redundant data even if it's ok with cassandra).
  • Put the column I want to query by within the primary key and in this case I need to define all the parts of the primary key in my WHERE clause and I can't uses other operator than IN or =.

Is there an other way to to what I am trying to do (WHERE clause with non-primary key column) without having the 3 constraints above?

like image 855
farhawa Avatar asked Feb 20 '16 14:02

farhawa


People also ask

Is primary key mandatory in Cassandra?

The primary key is a column that is used to uniquely identify a row. Therefore,defining a primary key is mandatory while creating a table.

Does Cassandra support foreign key?

Apache Cassandra does not have the concept of foreign keys or relational integrity. Apache Cassandra's data model is based around designing efficient queries; queries that don't involve multiple tables. Relational databases normalize data to avoid duplication.

Which of the following are not allowed in a CQL query?

CQL does not support OR queries. CQL does not support wildcard queries. CQL does not support Union, Intersection queries. Table columns cannot be filtered without creating the index.

How do I stop allow filtering in Cassandra?

((startdate,enddate,(id)) - keeping the start and end as partition and id as clustering - if your requirement is only the above query but again it will depend on how much data you will have in each range of dates - or else can you explain more about the requirement and nature of data ?


2 Answers

From within Cassandra itself you are limited to the options that you have specified above. If you want to know why take a look here:

A Deep Look to the CQL Where Clause

However if you are trying to run analytics on information stored within Cassandra then have you looked at using Spark. Spark is built for large scale data processing on distributed systems. In fact if you are looking at using Datastax (see here) which has some nice integration features between Spark and Cassandra specifically for loading and saving data. It has both a free (Community) and paid (Enterprise) editions.

like image 90
bechbd Avatar answered Sep 17 '22 14:09

bechbd


Please, try to use IF in your query:

UPDATE [keyspace_name.] table_name
[USING TTL time_value | USING TIMESTAMP timestamp_value]
SET assignment [, assignment] . . . 
WHERE row_specification
[IF EXISTS | IF condition [AND condition] . . .] ;

see https://docs.datastax.com/en/archived/cql/3.3/cql/cql_reference/cqlUpdate.html

like image 24
Matthew I. Avatar answered Sep 17 '22 14:09

Matthew I.