Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to achieve strong consistency in Kafka?

Try to understanding consistency maintenance in Kafka. Please find the scenario and help to understand.

 Number of partition  = 2 
    Replication factor = 3 
    Number of broker in the cluster = 4 

In that case, for achieving the strong consistency how many nodes should acknowledge. Either ack = all or ack = 3 or any other value. Please confirm for the same.

like image 820
Karthikeyan Rasipalay Durairaj Avatar asked Jan 11 '19 17:01

Karthikeyan Rasipalay Durairaj


2 Answers

You might be interested in seeing When it Absolutely, Positively, Has to be There talk from Kafka Summit.

Which was given by an engineer at Cloudera, and Cloudera has their own documenation on Kafka availability

To summarize, more than 1 replica and higher than 1 in-sync replica is a good start. Then on the producer, if you are okay with sacrificing throughput for data availability, meaning you must have all replicas be written before continuing, then acks=all. Otherwise, if you trust the leader broker to be highly available with unclean leader election is false, then acks=1 should be okay in most cases.

acks=3 isn't a valid config, by the way. I think you are looking for min.insync.replicas=2 and acks=all with a replication factor of 3; from above link

If min.insync.replicas is set to 2 and acks is set to all, each message must be written successfully to at least two replicas. This guarantees that the message is not lost unless both hosts crash

Also, you can enable the transactional producer, as of Kafka 0.11 to work towards exactly once processing

enable.idempotence=true
like image 85
OneCricketeer Avatar answered Oct 26 '22 16:10

OneCricketeer


In your setting, what you have is

  • 4 brokers
  • Replication factor = 3

That means each message in a given partition will be replicated to 3 out of 4 brokers, including the leader for that partition.

In-order to achieve strong consistency guarantees, you have to set min.insync.replicas to 2 and use acks=all. This way, you are guaranteed that each write goes to at-least 2 out of 3 brokers which hold the data, before which it is acknowledged.

Setting acks to all provides the highest consistency guarantee at the expense of slower writes to the cluster.

If you use older versions of Kafka where unclean leader election is true by default, you should also consider setting that to false explicitly. This way, an out of sync. broker won't be elected as the leader in case of leader crashes (effectively compromising availability).

Also, Kafka is a system where all the reads go through the leader. This is a bit different from some other distributed system such as zookeeper which supports read replicas. So you do not have a situation where a client ends up reading directly from a stale broker. Leader ensures that writes are ordered and replicated to designated number of in-sync replicas and acknowledged based on your acks setting.

like image 35
senseiwu Avatar answered Oct 26 '22 15:10

senseiwu