I am using Cassandra for the first time in a web app and I got a query problem. Here is my tab :
CREATE TABLE vote ( doodle_id uuid, user_id uuid, schedule_id uuid, vote int, PRIMARY KEY ((doodle_id), user_id, schedule_id) );
On every request, I indicate my partition key, doodle_id. For example I can make without any problems :
select * from vote where doodle_id = c4778a27-f2ca-4c96-8669-15dcbd5d34a7 and user_id = 97a7378a-e1bb-4586-ada1-177016405142;
But on the last request I made :
select * from vote where doodle_id = c4778a27-f2ca-4c96-8669-15dcbd5d34a7 and schedule_id = c37df0ad-f61d-463e-bdcc-a97586bea633;
I got the following error :
Bad Request: PRIMARY KEY column "schedule_id" cannot be restricted (preceding column "user_id" is either not restricted or by a non-EQ relation)
I'm new with Cassandra, but correct me if I'm wrong, in a composite primary key, the first part is the PARTITION KEY which is mandatory to allow Cassandra to know where to look for data. Then the others parts are CLUSTERING KEY to sort data.
But I still don't get why my first request is working and not the second one ?
If anyone could help it will be a great pleasure.
There is no way to change a primary key, as it defines how your data is physically stored. You can create a new table with the new primary key, copy data from the old one, and then drop the old table.
You cannot update any column in the primary key because that would change the primary key for the record.
Cassandra will request ALLOW FILTERING as it will have to first find and load the rows containing Jonathan as author, and then to filter out the ones which do not have a time2 column equal to the specified value. Adding an index on time2 might improve the query performance.
Yes the primary key has to be unique. Otherwise there would be no way to know which row to return when you query with a duplicate key. In your case you can have 2 rows with the same name or with the same surname but not both.
In Cassandra, you should design your data model to suit your queries. Therefore the proper way to support your second query (queries by doodle_id
and schedule_id
, but not necessarilly with user_id
), is to create a new table to handle that specific query. This table will be pretty much the same, except the PRIMARY KEY will be slightly different:
CREATE TABLE votebydoodleandschedule ( doodle_id uuid, user_id uuid, schedule_id uuid, vote int, PRIMARY KEY ((doodle_id), schedule_id, user_id) );
Now this query will work:
SELECT * FROM votebydoodleandschedule WHERE doodle_id = c4778a27-f2ca-4c96-8669-15dcbd5d34a7 AND schedule_id = c37df0ad-f61d-463e-bdcc-a97586bea633;
This gets you around having to specify ALLOW FILTERING
. Relying on ALLOW FILTERING
is never a good idea, and is certainly not something that you should do in a production cluster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With