Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is virtual nodes. and how it is helping during partitioning in Cassandra

Tags:

cassandra

I know we can use Cassandra's virtual node facility so that we can prevent additional overhead of assigning token (start token) to different nodes of cluster. Instead of that we use num token and its default values 256.
In what way these virtual nodes is making difference in partitioning? Meas Cassandra is setting/assigning token range(max and minimum token) for a particular node?

like image 835
Sarkar Avatar asked Aug 19 '14 09:08

Sarkar


People also ask

What is virtual node in Cassandra?

Virtual nodes, known as Vnodes, distribute data across nodes at a finer granularity than can be easily achieved if calculated tokens are used. Vnodes simplify many tasks in Cassandra: Tokens are automatically calculated and assigned to each node.

What are virtual nodes?

A virtual node (v-node) represents access to an object within a virtual file system. V-nodes are used only to translate a path name into a generic node (g-node).

How do virtual nodes work?

Virtual nodes (vnodes) distribute data across nodes at a finer granularity than can be easily achieved using a single-token architecture. DataStax Enterprise stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed.

What are partitions in Cassandra?

A partitioner determines how data is distributed across the nodes in the cluster (including replicas). Basically, a partitioner is a function for deriving a token representing a row from its partition key, typically by hashing. Each row of data is then distributed across the cluster by the value of the token.


1 Answers

What is virtual nodes?

Prior to Cassandra 1.2, each node was assigned to a specific token range. Now each node can support multiple, non-contiguous token ranges. Instead of a node being responsible for one large range of tokens, it is responsible for many smaller ranges. In this way, one physical node is essentially hosting many smaller "virtual" nodes.

In what way these virtual nodes is making difference in partitioning?

Consider the image in this blog: Virtual nodes in Cassandra 1.2.

virtual nodes

Having many smaller token ranges (nodes) on each physical node allows for a more even distribution of data. This becomes evident when you add a physical node to the cluster, in that rebalancing (manually reassigning token ranges) is no longer necessary. As the Virtual Node documentation states, the new node "assumes responsibility for an even portion of data from the other nodes in the cluster."

Cassandra is setting/assigning token range(max and minimum token) for a particular node?

Yes, Cassandra predetermines the size of each virtual node. However, you can control the number of virtual nodes assigned to each physical node. Assume that your physical nodes are all configured for the default of 256 virtual nodes. If you add a new machine with more resources than your current nodes, and you want that machine to handle more load, you could configure it to allow 384 virtual nodes instead. Likewise, a machine with fewer resources could be configured to support a smaller number of virtual nodes.

like image 173
Aaron Avatar answered Nov 22 '22 15:11

Aaron