Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to avoid secondary indexes in cassandra?

I have heard repeatedly that secondary indexes (in cassandra) is only for convenience but not for better performance. The only case where it is recommended to use secondary indexes when you have low cardinality (such as gender column which has two values male or female)

consider this example:

CREATE TABLE users ( 
userID uuid, 
firstname text, 
lastname text, 
state text, 
zip int, 
PRIMARY KEY (userID) 
);

right now I cannot do this query unless I create a secondary index on users on firstname index

select * from users where firstname='john'

How do I denormalize this table such that I can have this query: Is this the only efficient way by using composite keys? Any other alternatives or suggestions?

CREATE TABLE users ( 
    userID uuid, 
    firstname text, 
    lastname text, 
    state text, 
    zip int, 
    PRIMARY KEY (firstname,userID) 
    );
like image 661
brain storm Avatar asked Aug 04 '14 18:08

brain storm


People also ask

Does Cassandra support secondary indexes?

4. Secondary Indexes. Secondary Indexes in Cassandra solve the need for querying columns that are not part of the primary key. When we insert data, Cassandra uses an append-only file called commitlog for storing the changes, so writes are quick.

How do I drop a secondary index in Cassandra?

Cassandra Drop IndexCommand 'Drop index' drops the specified index. If index name was not given during index creation, then index name is TableName_ColumnName_idx. If the index does not exist, it will return an error unless IF EXISTS is used that will return no-op.

What is a secondary index in Cassandra?

What are Cassandra secondary indexes? Secondary indexes are indexes built over column values. In other words, let's say you have a user table, which contains a user's email. The primary index would be the user ID, so if you wanted to access a particular user's email, you could look them up by their ID.

How do I enable Sasi index in Cassandra?

To create SASI indexes use CQLs CREATE CUSTOM INDEX statement: cqlsh:demo> CREATE CUSTOM INDEX ON sasi (first_name) USING 'org. apache. cassandra.


1 Answers

In order to come up with a good data model, you need to identify first ALL queries you would like to perform. If you only need to look up users by their firstname (or firstname and userID), then your second design is fine...

If you also need to look up users by their last name, then you could create another table having the same fields but a primary key on (lastname, userID). Obviously you will need to update both tables in the same time. Data duplication is fine in Cassandra.

Still, if you are concerned about the space needed for the two or more tables, you could create a single users table partitioned by user id, and additional tables for the fields you want to query by:

CREATE TABLE users ( 
    userID uuid, 
    firstname text, 
    lastname text, 
    state text, 
    zip int, 
    PRIMARY KEY (userID) 
);

CREATE TABLE users_by_firstname (
    firstname text,
    userid uuid,
    PRIMARY KEY (firstname, userid)
);

The disadvantage of this solution is that you will need two queries to retrieve users by their first name:

SELECT userid FROM users_by_firstname WHERE firstname = 'Joe';
SELECT * FROM users WHERE userid IN (...);

Hope this helps

like image 83
medvekoma Avatar answered Sep 19 '22 13:09

medvekoma