I get bulk write request for let say some 20 keys from client. I can either write them to C* in one batch or write them individually in async way and wait on future to get them completed.
Writing in batch does not seem to be a goo option as per documentation as my insertion rate will be high and if keys belong to different partitions co-ordinators will have to do extra work.
Is there a way in datastax java driver with which I can group keys which could belong to same partition and then club them into small batches and then do invidual unlogged batch write in async. IN that way i make less rpc calls to server at the same time coordinator will have to write locally. I will be using token aware policy.
In Cassandra, batch allows the client to group related updates into a single statement. If some of the replicas for the batch fail mid-operation, the coordinator will hint those rows automatically.
The batch statement combines multiple data modification language statements (such as INSERT, UPDATE, and DELETE) to achieve atomicity and isolation when targeting a single partition or only atomicity when targeting multiple partitions.
An atomic transaction is an indivisible and irreducible series of operations such that either all occur, or nothing occurs. Single partition batch operations are atomic automatically, while multiple partition batch operations require the use of a batchlog to ensure atomicity.
Single partition batches should be used when atomicity and isolation is required. Even if you only need atomicity (and no isolation) you should model your data so that you can use single partition instead of multi partition batches.
Your idea is right, but there is no built-in way, you usually do that manually.
Main rule here is to use TokenAwarePolicy
, so some coordination would happen on driver side.
Then, you could group your requests by equality of partition key, that would probably be enough, depending on your workload.
What I mean by 'grouping by equality of partition key` is e.g. you have some data that looks like
MyData { partitioningKey, clusteringKey, otherValue, andAnotherOne }
Then when inserting several such objects, you group them by MyData.partitioningKey
. It is, for all existsing paritioningKey
values, you take all objects with same partitioningKey
, and wrap them in BatchStatement
. Now you have several BatchStatements
, so just execute them.
If you wish to go further and mimic cassandra hashing, then you should look at cluster metadata via getMetadata
method in com.datastax.driver.core.Cluster
class, there is method getTokenRanges
and compare them to result of Murmur3Partitioner.getToken
or any other partitioner you configured in cassandra.yaml
. I've never tried that myself though.
So, I would recommend to implement first approach, and then benchmark your application. I'm using that approach myself, and on my workload it works far better than without batches, let alone batches without grouping.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With