Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra seconday index vs materialized view

Tags:

cassandra

I'm modeling my table for Cassandra 3.0+. The objective is to build a table that store user's activities, here what i've done so far: (userid come from another database Mysql)

CREATE TABLE activity (
    userid int,
    type int,
    remoteid text,
    time timestamp,
    imported timestamp,
    visibility int,
    title text,
    description text,
    img text,
    customfields MAP<text,text>,
PRIMARY KEY (userid, type, remoteid, time, imported))

This are the main queries that i use:

SELECT * FROM activity WHERE userid = ? AND remoteid = ?;
SELECT * FROM activity WHERE userid = ? AND type = ? AND LIMIT 10;

Now i need to add the column visibility on the second query. So, from what i've learned around, i can choose between a secondary index or a materialized view. This are the facts:

  • Here i've one partition per user and inside there are thousands of rows (activities).
  • I use always the partition key (userid) in all my query to access the data.
  • the global number of activities are 30 millions, growing up.
  • visibility column has low cardinality (just 3 value) and could be updated, but rarely.

So what should i choose? materialized view or index? I know that index with low cardinality are bad choice, but my query include always the partition key and a limit, so maybe is not that bad.

like image 307
gorgonauta Avatar asked Feb 10 '17 12:02

gorgonauta


People also ask

What's the purpose of secondary index in Cassandra?

Secondary Indexes in Cassandra solve the need for querying columns that are not part of the primary key. When we insert data, Cassandra uses an append-only file called commitlog for storing the changes, so writes are quick.

What type of index is recommended for a materialized view?

Consequently, for best performance, create a single-column bitmap index on each materialized view key column. In the case of materialized views containing only joins using fast refresh, create indexes on the columns that contain the rowids to improve the performance of the refresh operation.

Why use materialized view instead of a view?

Querying materialized views, unlike querying tables or logical views, can reduce query costs by maintaining results in memory that are only updated when necessary.

Can materialized view have index?

A materialized view can be partitioned, and you can define a materialized view on a partitioned table. You can also define one or more indexes on the materialized view.


1 Answers

If you are always going to use the partition key I recommend using secondary indexes.

Materialized views are better when you do not know the partition key

References:

Principal Article!

• Cassandra Secondary Index Preview #1

Here is a comparison with the Materialized Views and the secondary indices

• Materialized View Performance in Cassandra 3.x

And here is where the PK is known is more effective to use an index

• Cassandra Native Secondary Index Deep Dive

like image 85
HerberthObregon Avatar answered Nov 11 '22 02:11

HerberthObregon