Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to filter cassandra query by a field in user defined type

how to filter cassandra query by user defined type field? i want to create people table in my cassandra database so i create this user-defined-type in my cassandra database.

    create type fullname ( firstname text, lastname text );

and i have this table too.

    create table people ( id UUID primary key, name frozen <fullname> );

and i need to filter my query to know all people with lastname jolie. how can i query this from this table. and totally how is filtering and query in cassandra? I know i can delete fullname type and add firstname and lastname to main table but it is a sample of what i want to do.i must have fullname type.

like image 628
reihaneh Avatar asked Nov 21 '15 05:11

reihaneh


People also ask

How do I select a query in Cassandra?

Cassandra provides standard built-in functions that return aggregate values to SELECT statements. A SELECT expression using COUNT(column_name) returns the number of non-NULL values in a column. A SELECT expression using COUNT(*) returns the number of rows that matched the query. Use COUNT(1) to get the same result.

How do I select distinct rows in Cassandra?

In cassandra you can only select the distinct records from Partition Key column or columns. If Partition key consists of multiple columns, you have to provide all of the columns otherwise you will get an error.

What is user-defined type in Cassandra?

User-Defined Types (UDTs) can be used to attach multiple data fields to a column. User-defined types (UDTs) can attach multiple data fields, each named and typed, to a single column. The fields used to create a UDT may be any valid data type, including collections and other existing UDTs.

What is the use of allow filtering in Cassandra?

Cassandra will request ALLOW FILTERING as it will have to first find and load the rows containing Jonathan as author, and then to filter out the ones which do not have a time2 column equal to the specified value. Adding an index on time2 might improve the query performance.


1 Answers

Short answer: you can use secondary indexes to query by fullname UDT. But you cannot query by only a part of your UDT.

// create table, type and index
create type fullname ( firstname text, lastname text );
create table people ( id UUID primary key, name frozen <fullname> );
create index fname_index on your_keyspace.people (name);

// insert some data into it
insert into people (id, name) values (now(), {firstname: 'foo', lastname: 'bar'});
insert into people (id, name) values (now(), {firstname: 'baz', lastname: 'qux'});

// query it by fullname
select * from people where name = { firstname: 'baz', lastname: 'qux' };

// the following will NOT work:
select * from people where name = { firstname: 'baz'};

The reason for such behaviour is a way C* secondary indexes are implemented. In general, it's just another hidden table maintained by C*, in your case defined as:

create table fname_index (name frozen <fullname> primary key, id uuid);

Actually your secondary and primary keys are swapped in this table. So your case is reduced to a more general question 'why can't I query by only a part of PK?':

  • the whole PK value (firstname+lastname) is hashed, the resulting number defines the partition to store your row.
  • for that partition your row is appended to a memtable (and later flushed on disk to SSTable, a file sorted by key)
  • when you want to query only by part of PK (like by firstname only), C* doesn't able to guess the partition to look for (as it doesn't able to compute the hashcode for the whole fullname as lastname is unknown), as your match can be anywhere in any partition requiring full-table scan. C* explicitly forbids these scans, so you have no choice :)

Suggested solutions:

  • split your UDT to essential parts like firstname and lastname and have secondary indexes on it.
  • use Cassandra 3.0 with materialized views feature (actually force cassandra to maintain a custom index for part of your UDT)
  • revisit your data model to be less strict (when no one forces you to use UDTs where they are not helpful)
like image 120
shutty Avatar answered Sep 23 '22 06:09

shutty