Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra batch query vs single insert performance

I use Cassandra java driver.

I receive 150k requests per second, which I insert to 8 tables having different partition keys.

My question is which is a better way:

  • batch inserting to these tables
  • inserting one by one.

I am asking this question because , considering my request size (150k), batch sounds like the better option but because all the tables have different partition keys, batch appears expensive.

like image 223
Prakash P Avatar asked Mar 21 '17 14:03

Prakash P


People also ask

Why use batch in Cassandra?

In Cassandra BATCH is used to execute multiple modification statements (insert, update, delete) simultaneously. It is very useful when you have to update some column as well as delete some of the existing.

What is the main use case for single partition batches?

Single partition batches should be used when atomicity and isolation is required. Even if you only need atomicity (and no isolation) you should model your data so that you can use single partition instead of multi partition batches.

How does Cassandra batch work?

In Cassandra, batch allows the client to group related updates into a single statement. If some of the replicas for the batch fail mid-operation, the coordinator will hint those rows automatically.

How do I insert multiple rows in Cassandra?

There is a batch insert operation in Cassandra. You can batch together inserts, even in different column families, to make insertion more efficient. In Hector, you can use HFactory. createMutator then use the add methods on the returned Mutator to add operations to your batch.


2 Answers

Please check my answer from below link:

Cassandra batch query performance on tables having different partition keys

Batches are not for improving performance. They are used for ensuring atomicity and isolation.

Batching can be effective for single partition write operations. But batches are often mistakenly used in an attempt to optimize performance. Depending on the batch operation, the performance may actually worsen.

https://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html

If data consistency is not needed among those tables, then use single insert. Single requests are distributed or propagated properly (depends on load balancing policy) among nodes. If you are concerned about request handling and use batch, batches will burden so many extra works on coordinator nodes which will not be efficient I guess :)

like image 58
Chaity Avatar answered Sep 17 '22 20:09

Chaity


Batches have a HUGE impact on performance instead. The sollution that best suits you as I understand to split into diffirent lists per partition keys and then use batch statements. You will see a huge impact on performance.

like image 43
giannisapi Avatar answered Sep 18 '22 20:09

giannisapi