high and low cardinality in Cassandra

1 Answers

The cardinality of X is nothing more than the number of elements that compose X. In Cassandra the partition key cardinality is very important for partitioning data.

Since the partition key is responsible for the distribution of the data across the cluster, choosing a low cardinality key might lead to a situation in which your data are not distributed.

Imagine you have a cluster of 20 nodes storing comments -- the RF is 2. Each comment has it's own vote going from 1 to 5. Now, since you want to easily retrieve comments by votes, you might be tempted to choose vote as partition key.

CREATE TABLE comments(vote int, content text, id uuid, PRIMARY KEY(vote, id));

In this situation the only key responsible for data distribution is vote, which has a very low cardinality since it can contains only 5 values (1,2,3,4,5). This means that, in the best situation 5 different nodes will be the owners of the 5 different partitions (which are "all comments with vote 1" ... "all comments with vote 5"), and again in best situation, with a RF of 2, 10 different nodes will hold your data. As you can see you have a 20 nodes cluster which isn't used more than 50% in best situation.

Data distribution is very important, that's why partition key cardinality matters a lot

HTH, Carlo

answered Nov 04 '22 16:11

Carlo Bertuccini

Related questions
                            
                                Prepared statements vs Bound statements in Cassandra?
                            
                                Cassandra LOCAL_QUORUM
                            
                                Cassandra TTL gets set to 0 on primary key if no TTL is specified on an update, but if it is, the TTL on the primary key does not change
                            
                                JavaSparkContext not serializable
                            
                                Cassandra LeveledCompactionStrategy and high SSTable number per read
                            
                                how to archive and purge Cassandra data
                            
                                Is it possible to insert/write data without defining columns in Cassandra?
                            
                                What are best practices for backing up a cassandra cluster?
                            
                                Understanding Cassandra's storage overhead
                            
                                what does `create index` do in cassandra tables?
                            
                                How to set up Cassandra client-to-node encryption with the DataStax Java driver?
                            
                                modelling cassandra tables for upsert and select query
                            
                                Best way to add multiple nodes to existing cassandra cluster
                            
                                Modeling many-to-many relations in Cassandra 2 with CQL3
                            
                                Cassandra control SSTable size
                            
                                are writes always faster than reads in Cassandra?
                            
                                How Cassandra select the node to send request?
                            
                                Cassandra Static Column design [closed]
                            
                                How to use cassandra Stress tool
                            
                                Nested query not working in Cassandra

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

high and low cardinality in Cassandra

Tags:

cardinality

cassandra-2.0

eagertoLearn

People also ask

1 Answers

Carlo Bertuccini

Recent Activity

Donate For Us