I am using DataStax Cassandra 1.2.3 on a 6 node cluster each having quad-core 3GHz processor and 8GB RAM. Recently, I started to use the VNodes feature by setting the num_tokens to 256 first and then to 128. I observe a decline in performance [No.of write requests/sec] for the schema that I am using. I mostly have a normalized schema with a mix of wide tables & counter column families. <ol> <li>Has anyone observed a decline in performance using the VNodes? Are there any known optimization techniques to better utilize VNodes? </li> <li>Is there an optimum value for num_tokens that can be derived for a given hardware configuration/node?</li> <li>Also, I see that the cluster is nearly balanced with one node taking a higher share of the load automatically although I have a homogeneous cluster. Prior to using VNodes I would manually balance the cluster for Murmer3Partitioner and the performance was good. </li> </ol> Thanks, VS

(This is a modified version of my post: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Why-so-many-vnodes-td7588267.html) The number of tokens per node (let's call it T and the number of nodes N), 256, was chosen to give good load balancing for random token assignments for most cluster sizes. For small T, a random choice of initial tokens will in most cases give a poor distribution of data. The larger T is, the closer to uniform the distribution will be, with increasing probability. Also, for small T, when a new node is added, it won't have many ranges to split so won't be able to take an even slice of the data. For this reason T should be large. But if it is too large, there are too many slices to keep track of so performance will be hit. The function to find which keys live where becomes more expensive and operations that deal with individual vnodes e.g. repair become slow. (An extreme example is SELECT * LIMIT 1, which when there is no data has to scan each vnode in turn in search of a single row. This is O(NT) and for even quite small T takes seconds to complete.) So 256 was chosen to be a reasonable balance. I don't think most users will find it too slow; users with extremely large clusters may need to increase it.

Does Cassandra VNodes trade performance?

Tags:

cassandra

datastax-enterprise

I am using DataStax Cassandra 1.2.3 on a 6 node cluster each having quad-core 3GHz processor and 8GB RAM. Recently, I started to use the VNodes feature by setting the num_tokens to 256 first and then to 128. I observe a decline in performance [No.of write requests/sec] for the schema that I am using. I mostly have a normalized schema with a mix of wide tables & counter column families.

Has anyone observed a decline in performance using the VNodes? Are there any known optimization techniques to better utilize VNodes?
Is there an optimum value for num_tokens that can be derived for a given hardware configuration/node?
Also, I see that the cluster is nearly balanced with one node taking a higher share of the load automatically although I have a homogeneous cluster. Prior to using VNodes I would manually balance the cluster for Murmer3Partitioner and the performance was good.

Thanks, VS

761

asked Jun 13 '13 10:06

vinay sudhakar

1 Answers

(This is a modified version of my post: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Why-so-many-vnodes-td7588267.html)

The number of tokens per node (let's call it T and the number of nodes N), 256, was chosen to give good load balancing for random token assignments for most cluster sizes. For small T, a random choice of initial tokens will in most cases give a poor distribution of data. The larger T is, the closer to uniform the distribution will be, with increasing probability.

Also, for small T, when a new node is added, it won't have many ranges to split so won't be able to take an even slice of the data.

For this reason T should be large. But if it is too large, there are too many slices to keep track of so performance will be hit. The function to find which keys live where becomes more expensive and operations that deal with individual vnodes e.g. repair become slow. (An extreme example is SELECT * LIMIT 1, which when there is no data has to scan each vnode in turn in search of a single row. This is O(NT) and for even quite small T takes seconds to complete.)

So 256 was chosen to be a reasonable balance. I don't think most users will find it too slow; users with extremely large clusters may need to increase it.

175

answered Oct 12 '22 06:10

Richard

Related questions
                            
                                Using Datastax Java Driver to query a row as a JSON
                            
                                Cassandra DB. com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table person
                            
                                jemalloc shared library could not be preloaded to speed up memory allocations
                            
                                map<text, object> Cassandra, is it possible
                            
                                Search key of big partition in cassandra
                            
                                is not null or not equal clause in cassandra
                            
                                How to retrieve the column having datatype as "list" from the table of Cassandra?
                            
                                What is gc grace in Cassandra
                            
                                Does cassandra flush memtables on nodetool stopdaemon. If not what to do to avoid data loss
                            
                                gc.log file error when running cassandra
                            
                                Trying to use Guid from C# as primary key in Cassandra
                            
                                Read path and compression offset map
                            
                                How to know Native Clients connected to Cassandra
                            
                                Understand Cassandra pooling options (setCoreConnectionsPerHost and setMaxConnectionsPerHost)?
                            
                                Which is the most suitable Key-Value Store for a RDBMS background person?
                            
                                Securing Cassandra communication with TLS/SSL
                            
                                How would I add cassandra support for symfony/doctrine?
                            
                                how to efficiently manage cassandra initial token?
                            
                                How to solve jna not found issue in cassandra1.1.2
                            
                                "Supernodes" in Titan

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With