So I see here that Cassandra does not have automatic load balancing, which comes into view when using the ordered partitioner (a certain common range of values of a group of rows would be stored on a relatively few machines which would then serve most of the queries). What's The Best Practice In Designing A Cassandra Data Model? I'm still new to Cassandra and how it works. how would one go about avoiding this issue, so that range queries are still possible? I didn't really get the above answers (linked url) idea about appending a hash to keys.

I think this issue is best taken up on the cassandra-user mailing list; that is where people are. Cassandra does not have automatic load balancing yet but it may do so in the not-too-distant future. The 0.5 branch may be capable of this now. Essentially when you bootstrap a node on an already-running system, it should find a spot in the ring which will load balance best and put itself there. Provided you add nodes one at a time (i.e. wait for one node to finish bootstrapping before adding another), that should work pretty well, provided your key distribution doesn't change too much over time. However, your keys may change over time (especially if they are time-based) so you might want a workaround. It depends on what you want to range-scan. If you only need to range scan PART of the key, you could hash the bit that you don't want to range scan, and use that as the first part of the key. I'll use the term "partition" here to refer to the part of the key you don't want to range scan <pre class="prettyprint"><code>function makeWholeKey(partition, key) { return concat(make_hash(partition), partition, key); } </code></pre> Now if you want to range scan the keys within a given partition, you can range scan between makeWholeKey(p,start) and makeWholeKey(p,end) But if you want to scan the partitions, you're out of luck. But you can make your nodes have tokens which are evenly distributed around the range of make_hash() output, and you'll get evenly distributed data (assuming you have ENOUGH partitions that it doesn't all clump up on one or two hash values)

Cassandra load balancing with an ordered partitioner?

Tags:

cassandra

So I see here that Cassandra does not have automatic load balancing, which comes into view when using the ordered partitioner (a certain common range of values of a group of rows would be stored on a relatively few machines which would then serve most of the queries).
What's The Best Practice In Designing A Cassandra Data Model?

I'm still new to Cassandra and how it works. how would one go about avoiding this issue, so that range queries are still possible? I didn't really get the above answers (linked url) idea about appending a hash to keys.

562

asked Nov 20 '09 01:11

deepblue

2 Answers

As mentioned on the other post, Cassandra 0.5 supports semiautomatic load balancing, where all you have to do is tell a node to loadbalance and it will move to a busier place on the token ring automatically.

This is covered in http://wiki.apache.org/cassandra/Operations

194

answered Sep 19 '22 12:09

jbellis

I think this issue is best taken up on the cassandra-user mailing list; that is where people are.

Cassandra does not have automatic load balancing yet but it may do so in the not-too-distant future. The 0.5 branch may be capable of this now.

Essentially when you bootstrap a node on an already-running system, it should find a spot in the ring which will load balance best and put itself there. Provided you add nodes one at a time (i.e. wait for one node to finish bootstrapping before adding another), that should work pretty well, provided your key distribution doesn't change too much over time.

However, your keys may change over time (especially if they are time-based) so you might want a workaround.

It depends on what you want to range-scan. If you only need to range scan PART of the key, you could hash the bit that you don't want to range scan, and use that as the first part of the key.

I'll use the term "partition" here to refer to the part of the key you don't want to range scan

function makeWholeKey(partition, key) {
   return concat(make_hash(partition), partition, key);
}

Now if you want to range scan the keys within a given partition, you can range scan between makeWholeKey(p,start) and makeWholeKey(p,end)

But if you want to scan the partitions, you're out of luck.

But you can make your nodes have tokens which are evenly distributed around the range of make_hash() output, and you'll get evenly distributed data (assuming you have ENOUGH partitions that it doesn't all clump up on one or two hash values)

answered Sep 22 '22 12:09

MarkR

Related questions
                            
                                How to read the cassandra nodetool histograms percentile and other columns?
                            
                                How to set Cassandra (>2.0) JVM heap size of 8GB?
                            
                                Cannot restrict clustering columns by in relations when a collection is selected by the query
                            
                                Strong Consistency in Cassandra
                            
                                setting up cassandra multi node cluster on a single ubuntu server
                            
                                How do you insert custom timeuuid's to cassandra without the now() function?
                            
                                Read/Write Strategy For Consistency Level
                            
                                Is a cassandra session thread safe? (using cpp driver)
                            
                                Is there an elegant way to perform a JSON update via CQL (Cassandra)?
                            
                                How cassandra replicates data
                            
                                What is the difference between a secondary index and an inverted index in Cassandra?
                            
                                Meteor.js possible with Cassandra instead of MongDB? [closed]
                            
                                Datastax Cassandra without root
                            
                                bash: jstat: command not found
                            
                                Cassandra node almost out of space, but nodetool cleanup is increasing disk use?
                            
                                Cassandra 3.0 updated SSTable format
                            
                                Cassandra NoHostAvailable: error in CQLSH
                            
                                Cassandra failed to connect
                            
                                Storing media files in Cassandra
                            
                                How to store array of objects in Cassandra

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With