Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka: isolation level implications

Tags:

apache-kafka

I have a use case where I need 100% reliability, idempotency (no duplicate messages) as well as order-preservation in my Kafka partitions. I'm trying to set up a proof of concept using the transactional API to achieve this. There is a setting called 'isolation.level' that I'm struggling to understand.

In this article, they talk about the difference between the two options

There are now two new isolation levels in Kafka consumer:

read_committed: Read both kind of messages that are not part of a transaction and that are, after the transaction is committed. Read_committed consumer uses end offset of a partition, instead of client-side buffering. This offset is the first message in the partition belonging to an open transaction. It is also known as “Last Stable Offset” (LSO). A read_committed consumer will only read up till the LSO and filter out any transactional messages which have been aborted.

read_uncommitted: Read all messages in offset order without waiting for transactions to be committed. This option is similar to the current semantics of a Kafka consumer.

The performance implication here is obvious but I'm honestly struggling to read between the lines and understand the functional implications/risk of each choice. It seems like read_committed is 'safer' but I want to understand why.

like image 581
b15 Avatar asked May 08 '19 19:05

b15


2 Answers

First, the isolation.level setting only has an impact on the consumer if the topics it's consuming from contains records written using a transactional producer.

If so, if it's set to read_uncommitted, the consumer will simply read everything including aborted transactions. That is the default.

When set to read_committed, the consumer will only be able to read records from committed transactions (in addition to records not part of transactions). It also means that in order to keep ordering, if a transaction is in-flight the consumer will not be able to consume records that are part of that transation. Basically the broker will only allow the consumer to read up to the Last Stable Offset (LSO). When the transation is committed (or aborted), the broker will update the LSO and the consumer will receive the new records.

If you don't tolerate duplicates or records from aborted transactions, then you should use read_committed. As you hinted this creates a small delay in consuming as records are only visible once transactions are committed. The impact mostly depends on the sizes of your transactions, ie how often you commit.

like image 131
Mickael Maison Avatar answered Sep 19 '22 00:09

Mickael Maison


If you are not using transactions in your producer, the isolation level does not matter. If you are, then you must use read_committed if you want the consumers to honor the transactional nature. Here are some additional references:

https://www.confluent.io/blog/transactions-apache-kafka/ https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit

like image 31
frankgreco Avatar answered Sep 19 '22 00:09

frankgreco