Try to understanding consistency maintenance in Kafka. Please find the scenario and help to understand.
Number of partition = 2
Replication factor = 3
Number of broker in the cluster = 4
In that case, for achieving the strong consistency how many nodes should acknowledge. Either ack = all
or ack = 3
or any other value. Please confirm for the same.
You might be interested in seeing When it Absolutely, Positively, Has to be There talk from Kafka Summit.
Which was given by an engineer at Cloudera, and Cloudera has their own documenation on Kafka availability
To summarize, more than 1 replica and higher than 1 in-sync replica is a good start. Then on the producer, if you are okay with sacrificing throughput for data availability, meaning you must have all replicas be written before continuing, then acks=all
. Otherwise, if you trust the leader broker to be highly available with unclean leader election is false, then acks=1
should be okay in most cases.
acks=3
isn't a valid config, by the way. I think you are looking for min.insync.replicas=2
and acks=all
with a replication factor of 3
; from above link
If
min.insync.replicas
is set to2
andacks
is set toall
, each message must be written successfully to at least two replicas. This guarantees that the message is not lost unless both hosts crash
Also, you can enable the transactional producer, as of Kafka 0.11 to work towards exactly once processing
enable.idempotence=true
In your setting, what you have is
That means each message in a given partition will be replicated to 3 out of 4 brokers, including the leader for that partition.
In-order to achieve strong consistency guarantees, you have to set min.insync.replicas
to 2 and use acks=all
. This way, you are guaranteed that each write goes to at-least 2 out of 3 brokers which hold the data, before which it is acknowledged.
Setting acks to all provides the highest consistency guarantee at the expense of slower writes to the cluster.
If you use older versions of Kafka where unclean leader election is true
by default, you should also consider setting that to false
explicitly. This way, an out of sync. broker won't be elected as the leader in case of leader crashes (effectively compromising availability).
Also, Kafka is a system where all the reads go through the leader. This is a bit different from some other distributed system such as zookeeper which supports read replicas. So you do not have a situation where a client ends up reading directly from a stale broker. Leader ensures that writes are ordered and replicated to designated number of in-sync replicas and acknowledged based on your acks
setting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With