Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra Batch statement-Multiple tables

I want to use batch statement to delete a row from 3 tables in my database to ensure atomicity. The partition key is going to be the same in all the 3 tables. In all the examples that I read about batch statements, all the queries were for a single table? In my case, is it a good idea to use batch statements? Or, should I avoid it?

I'm using Cassandra-3.11.2 and I execute my queries using the C++ driver.

like image 968
Vishal Sharma Avatar asked Mar 19 '18 06:03

Vishal Sharma


People also ask

When to use batch Cassandra?

The best use of a batch request is for a single partition in multiple tables in the same keyspace. Also, batches provide a guarantee that mutations will be applied in a particular order.

What is batch statement in Cassandra?

The batch statement combines multiple data modification language statements (such as INSERT, UPDATE, and DELETE) to achieve atomicity and isolation when targeting a single partition or only atomicity when targeting multiple partitions.


1 Answers

Yes, you can use batch to ensure atomicity. Single partition batches are faster (same table and same partition key) but only for a limited number of partitions (in your case three) it is okay. But don't use it for performance optimization (Ex: reduce of multiple requests). If you need atomicity you can use it.

You can check below links:

Cassandra batch query performance on tables having different partition keys
Cassandra batch query vs single insert performance
How single parition batch in cassandra function for multiple column update?

EDITED

In my case, the tables are different but the partition key is the same in all 3 tables. So is this a special case of single partition batch or is it something entirely different.

For different tables partitions are also different. So this is a multi partition batch. LOGGED batches are used to ensure atomicity for different partitions (different tables or different partition keys). UNLOGGED batches are used to ensure atomicity and isolation for single partition batch. If you use UNLOGGED batch for multi partition batch atomicity will not be ensured. Default is LOGGED batch. For single partition batch default is UNLOGGED. Cause single partition batch is considered as single row mutation. For single row update, there is no need of using LOGGED batch. To know about LOGGED or UNLOGGED batch, I have shared a link below.

Multi partition batches should only be used to achieve atomicity for a few writes on different tables. Apart from this they should be avoided because they’re too expensive.

Single partition batches can be used to achieve atomicity and isolation. They’re not much more expensive than normal writes.

But you can use multi partition LOGGED batch as partitions are limited.

A very useful Doc in Batch and all the details are provided. If you read this, all the confusions will be cleared.

Cassandra - to BATCH or not to BATCH

Partition Key tokens vs row partition

Table partitions and partition key tokens are different. Partition key is used to decide which node the data resides. For same row key partition tokens are same thus resides in the same node. For different partition key or same key different tables they are different row mutation. You cannot get data with one query for different partition keys or from different tables even if for the same key. Coordinator nodes have to treat it as different request or mutation and request the actual data from replicated nodes separately. It's the internal structure of how C* stores data.

Every table even has it's own directory structure making it clear that a partition from one table will never interact with the partition of another.

Does the same partition key in different cassandra tables add up to cell theoretical limit?

To know details how C* maps data check this link:

Understanding How CQL3 Maps to Cassandra's Internal Data Structure

like image 147
Chaity Avatar answered Sep 28 '22 05:09

Chaity