Is it possible to read data only from a single node in a Cassandra cluster with a replication factor of 3?

Tags:

I know that Cassandra have different read consistency levels but I haven't seen a consistency level which allows as read data by key only from one node. I mean if we have a cluster with a replication factor of 3 then we will always ask all nodes when we read. Even if we choose a consistency level of one we will ask all nodes but wait for the first response from any node. That is why we will load not only one node when we read but 3 (4 with a coordinator node). I think we can't really improve a read performance even if we set a bigger replication factor.

Is it possible to read really only from a single node?

272

asked Apr 08 '16 17:04

Oleksandr

2 Answers

Are you using a Token-Aware Load Balancing Policy?

If you are, and you are querying with a consistency of LOCAL_ONE/ONE, a read query should only contact a single node.

Give the article Ideology and Testing of a Resilient Driver a read. In it, you'll notice that using the TokenAwarePolicy has this effect:

"For cases with a single datacenter, the TokenAwarePolicy chooses the primary replica to be the chosen coordinator in hopes of cutting down latency by avoiding the typical coordinator-replica hop."

So here's what happens. Let's say that I have a table for keeping track of Kerbalnauts, and I want to get all data for "Bill." I would use a query like this:

SELECT * FROM kerbalnauts WHERE name='Bill';

The driver hashes my partition key value (name) to the token of 4639906948852899531 (SELECT token(name) FROM kerbalnauts WHERE name='Bill'; returns that value). If I am working with a 6-node cluster, then my primary token ranges will look like this:

node   start range              end range
1)     9223372036854775808 to  -9223372036854775808
2)    -9223372036854775807 to  -5534023222112865485
3)    -5534023222112865484 to  -1844674407370955162
4)    -1844674407370955161 to   1844674407370955161
5)     1844674407370955162 to   5534023222112865484
6)     5534023222112865485 to   9223372036854775807

As node 5 is responsible for the token range containing the partition key "Bill," my query will be sent to node 5. As I am reading at a consistency of LOCAL_ONE, there will be no need for another node to be contacted, and the result will be returned to the client...having only hit a single node.

Note: Token ranges computed with:

python -c'print [str(((2**64 /5) * i) - 2**63) for i in range(6)]'

187

answered Nov 02 '22 04:11

Aaron

I mean if we have a cluster with a replication factor of 3 then we will always ask all nodes when we read

Wrong, with Consistency Level ONE the coordinator picks the fastest node (the one with lowest latency) to ask for data.

How does it know which replica is the fastest ? By keeping internal latency stats for each node.

With consistency level >= QUORUM, the coordinator will ask for data from the fastest node and also asks for digest from other replicas

From the client side, if you choose the appropriate load balancing strategy (e.g. TokenAwareStrategy) the client will always contact the primary replica when using consistency level ONE

answered Nov 02 '22 03:11

doanduyhai

Related questions
                            
                                Cassandra access control
                            
                                cassandra c# driver memory leak
                            
                                Can not start Cassandra 2.0 on Ubuntu 13.04 "upgrade to 1.2.5+ first"
                            
                                Using triggers in cassandra
                            
                                Migrate data from cassandra to cassandra
                            
                                "Greater than" where-condition on timeuuid using Datastax C# Cassandra Driver
                            
                                How to query for only 1 field with Spring Data Cassandra?
                            
                                Cassandra won't start in linux as a service
                            
                                NoSuchMethodError Sets.newConcurrentHashSet() while running jar using hadoop
                            
                                Spark- Saving JavaRDD to Cassandra
                            
                                Cassandra CQL3 conditional insert/update
                            
                                In Cassandra, if I run a query that increments a counter, then selects from that counter is that atomic?
                            
                                How to use OpsCenter with CCM?
                            
                                There is no rollback in Cassandra, then how does Cassandra remove failed writes?
                            
                                Cassandra cleanup on several servers at once
                            
                                cassandra Select on indexed columns and with IN clause for the PRIMARY KEY are not supported
                            
                                All cassandra's commit log functions and behaviour during flush
                            
                                Cassandra order and clustering key
                            
                                Cassandra: Adding new column to the table
                            
                                Java program terminate with java result: 137

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to read data only from a single node in a Cassandra cluster with a replication factor of 3?

Tags:

cassandra

replication

consistency

cluster-computing

database-replication

Oleksandr

People also ask

2 Answers

Aaron

doanduyhai

Recent Activity

Donate For Us