I use the following code for creating table:
CREATE KEYSPACE mykeyspace
WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
USE mykeyspace;
CREATE TABLE users (
user_id int PRIMARY KEY,
fname text,
lname text
);
INSERT INTO users (user_id, fname, lname)
VALUES (1745, 'john', 'smith');
INSERT INTO users (user_id, fname, lname)
VALUES (1744, 'john', 'doe');
INSERT INTO users (user_id, fname, lname)
VALUES (1746, 'john', 'smith');
I would like to find the distinct value of lname
column (that is not a PRIMARY KEY). I would like to get the following result:
lname
-------
smith
By using SELECT DISTINCT lname FROM users;
However since lname
is not a PRIMARY KEY
I get the following error:
InvalidRequest: code=2200 [Invalid query] message="SELECT DISTINCT queries must
only request partition key columns and/or static columns (not lname)"
cqlsh:mykeyspace> SELECT DISTINCT lname FROM users;
How can I get the distinct values from lname
?
Use the DISTINCT keyword to return only distinct (different) values of partition keys. The FROM clause specifies the table to query. You may want to precede the table name with the name of the keyspace followed by a period (.). If you do not specify a keyspace, Cassandra queries the current keyspace.
In cassandra you can only select the distinct records from Partition Key column or columns. If Partition key consists of multiple columns, you have to provide all of the columns otherwise you will get an error.
The primary key has to be unique for each record. Otherwise, Cassandra will do an upsert if you try to add records with a primary key that already exists. When a table has multiple fields as its primary key, we call it composite primary key. The table can also have a single field as its primary key.
SELECT clause is used to read data from a table in Cassandra. Using this clause, you can read a whole table, a single column, or a particular cell. Given below is the syntax of SELECT clause.
User - Undefined_variable - makes two good points:
DISTINCT
only works on partition keys.So, one way to get this to work, would be to build a specific table to support that query:
CREATE TABLE users_by_lname (
lname text,
fname text,
user_id int,
PRIMARY KEY (lname, fname, user_id)
);
Now after I run your INSERTs to this new query table, this works:
aploetz@cqlsh:stackoverflow> SELECT DISTINCT lname FROm users_by_lname ;
lname
-------
smith
doe
(2 rows)
Notes: In this table, all rows with the same partition key (lname
) will be sorted by fname
, as fname
is a clustering key. I added user_id
as an additional clustering key, just to ensure uniqueness.
There is no such functionality in cassandra. DISTINCT is possible on partition key only. You should Design Your data model based on your requirements. You have to process the data in application logic (spark may be useful)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With