I want to SELECT DISTINCT
in Cassandra, but I think Cassandra doesn't support these operations. How can I do SELECT DISTINCT
in Cassandra? Is it possible?
Yes the primary key has to be unique. Otherwise there would be no way to know which row to return when you query with a duplicate key. In your case you can have 2 rows with the same name or with the same surname but not both.
SELECT * FROM blogs WHERE author='Jonathan Ellis' and time2 = 1418306451235; Cassandra will request ALLOW FILTERING as it will have to first find and load the rows containing Jonathan as author, and then to filter out the ones which do not have a time2 column equal to the specified value.
The GROUP BY option can condense all selected rows that share the same values for a set of columns into a single row. Using the GROUP BY option, rows can be grouped at the partition key or clustering column level. Consequently, the GROUP BY option only accepts primary key columns in defined order as arguments.
CQL 3.1.1 and onwards support DISTINCT operation only for partition keys.
SELECT statement now allows listing the partition keys (using the DISTINCT modifier). See CASSANDRA-4536.
Select Syntax
select_statement ::= SELECT [ JSON | DISTINCT ] ( select_clause | '*' )
FROM table_name
[ WHERE where_clause ]
[ GROUP BY group_by_clause ]
[ ORDER BY ordering_clause ]
[ PER PARTITION LIMIT (integer | bind_marker) ]
[ LIMIT (integer | bind_marker) ]
[ ALLOW FILTERING ]
select_clause ::= selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector ::= column_name
| term
| CAST '(' selector AS cql_type ')'
| function_name '(' [ selector ( ',' selector )* ] ')'
| COUNT '(' '*' ')'
where_clause ::= relation ( AND relation )*
relation ::= column_name operator term
'(' column_name ( ',' column_name )* ')' operator tuple_literal
TOKEN '(' column_name ( ',' column_name )* ')' operator term
operator ::= '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
group_by_clause ::= column_name ( ',' column_name )*
ordering_clause ::= column_name [ ASC | DESC ] ( ',' column_name [ ASC | DESC ] )*
As others wrote, there is no support for distinct on columns in Cassandra, only on primary key. Two options to do this with cassandra:
The decision between the two depends on your actual data structure/size and needs, if the table size is small or you do this operation very little, option 1 will be enough and fast, if the table is large, and/or you do this query a lot of times go with #2.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With