I get bulk write request for let say some 20 keys from client. I can either write them to C* in one batch or write them individually in async way and wait on future to get them completed. Writing in batch does not seem to be a goo option as per documentation as my insertion rate will be high and if keys belong to different partitions co-ordinators will have to do extra work. <blockquote> Is there a way in datastax java driver with which I can group keys which could belong to same partition and then club them into small batches and then do invidual unlogged batch write in async. IN that way i make less rpc calls to server at the same time coordinator will have to write locally. I will be using token aware policy. </blockquote>

Your idea is right, but there is no built-in way, you usually do that manually. Main rule here is to use <code>TokenAwarePolicy</code>, so some coordination would happen on driver side. Then, you could group your requests by equality of partition key, that would probably be enough, depending on your workload. What I mean by 'grouping by equality of partition key` is e.g. you have some data that looks like <pre class="prettyprint"><code>MyData { partitioningKey, clusteringKey, otherValue, andAnotherOne } </code></pre> Then when inserting several such objects, you group them by <code>MyData.partitioningKey</code>. It is, for all existsing <code>paritioningKey</code> values, you take all objects with same <code>partitioningKey</code>, and wrap them in <code>BatchStatement</code>. Now you have several <code>BatchStatements</code>, so just execute them. If you wish to go further and mimic cassandra hashing, then you should look at cluster metadata via <code>getMetadata</code> method in <code>com.datastax.driver.core.Cluster</code> class, there is method <code>getTokenRanges</code> and compare them to result of <code>Murmur3Partitioner.getToken</code> or any other partitioner you configured in <code>cassandra.yaml</code>. I've never tried that myself though. So, I would recommend to implement first approach, and then benchmark your application. I'm using that approach myself, and on my workload it works far better than without batches, let alone batches without grouping.

Cassandra : Batch write optimisation

Tags:

cassandra

datastax-java-driver

datastax

cassandra-3.0

I get bulk write request for let say some 20 keys from client. I can either write them to C* in one batch or write them individually in async way and wait on future to get them completed.

Writing in batch does not seem to be a goo option as per documentation as my insertion rate will be high and if keys belong to different partitions co-ordinators will have to do extra work.

Is there a way in datastax java driver with which I can group keys which could belong to same partition and then club them into small batches and then do invidual unlogged batch write in async. IN that way i make less rpc calls to server at the same time coordinator will have to write locally. I will be using token aware policy.

780

asked Aug 13 '16 10:08

Peter

1 Answers

Your idea is right, but there is no built-in way, you usually do that manually.

Main rule here is to use TokenAwarePolicy, so some coordination would happen on driver side. Then, you could group your requests by equality of partition key, that would probably be enough, depending on your workload.

What I mean by 'grouping by equality of partition key` is e.g. you have some data that looks like

MyData { partitioningKey, clusteringKey, otherValue, andAnotherOne }

Then when inserting several such objects, you group them by MyData.partitioningKey. It is, for all existsing paritioningKey values, you take all objects with same partitioningKey, and wrap them in BatchStatement. Now you have several BatchStatements, so just execute them.

If you wish to go further and mimic cassandra hashing, then you should look at cluster metadata via getMetadata method in com.datastax.driver.core.Cluster class, there is method getTokenRanges and compare them to result of Murmur3Partitioner.getToken or any other partitioner you configured in cassandra.yaml. I've never tried that myself though.

So, I would recommend to implement first approach, and then benchmark your application. I'm using that approach myself, and on my workload it works far better than without batches, let alone batches without grouping.

139

answered Sep 23 '22 01:09

folex

Related questions
                            
                                What's the nature of Cassandra "write timeout"?
                            
                                How to deploy changes to a Cassandra CQL schema
                            
                                Are there any "gotchas" in deploying a Cassandra cluster to a set of Linode VPS instances?
                            
                                Solandra vs. ElasticSearch
                            
                                Read-your-own-writes consistency in Cassandra
                            
                                Efficient modeling of versioned hierarchies in Cassandra
                            
                                How do you use the Cassandra tool sstableloader?
                            
                                What address should i use for listen_address in cassandra.yaml ?
                            
                                How to resolve "cassandra.cluster.NoHostAvailable" in a Python multi threaded program
                            
                                Brew Cassandra Installation
                            
                                Error: unable to connect to cassandra server. Unconfigured table
                            
                                How to fetch offset id while consuming Kafka from Spark, save it in Cassandra and use it to restart Kafka?
                            
                                ScyllaDB - [Invalid query] message="Collection filtering is not supported yet"
                            
                                How to actually set up basic Titan + Rexster + Cassandra?
                            
                                How Cassandra handles concurrent updates?
                            
                                Hadoop, Hive, Pig, HBase, Cassandra - when to use what? [closed]
                            
                                What is the maximum length of data passed to cassandra column
                            
                                What are the implications of using lightweight transactions?
                            
                                Cassandra instead of MySQL for social networking app
                            
                                How to export data from Cassandra cluster and import in another

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With