Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transactional Producer vs Just Idempotent Producer Java (Exception OutOfOrderSequenceException)

I use spring-kafka with idempotent producer configuration:

these are my configuration props:

   Properties props = new Properties();
    props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, Joiner.on(",").join(appProps.getBrokers()));
    //configure the following three settings for SSL Encryption
    props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SSL");
    props.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, appProps.getJksLocation());
    props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG,  appProps.getJksPassword());
    props.put(ProducerConfig.ACKS_CONFIG, "all");
    props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG,true);
    props.put(ProducerConfig.RETRIES_CONFIG, 5);
    props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
    props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");

My kafka producer throws OutOfOrderSequenceException:

2019-03-06 21:25:47 Sender [ERROR] [Producer clientId=producer-1] The broker returned org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker received an out of order sequence number for topic-partition topic-1 at offset -1. This indicates data loss on the broker, and should be investigated. 2019-03-06 21:25:47 TransactionManager [INFO] [Producer clientId=producer-1] ProducerId set to -1 with epoch -1 2019-03-06 21:25:47 ProducerKafka [ERROR] we encountered error while sending to kafka, please retry the job

I am not sure why this exception is being thrown. I couldn't find a concrete answer to this. The official javadoc for the exception states the following:

This exception indicates that the broker received an unexpected sequence number from the producer, which means that data may have been lost. If the producer is configured for idempotence only (i.e. if enable.idempotence is set and no transactional.id is configured), it is possible to continue sending with the same producer instance, but doing so risks reordering of sent records. For transactional producers, this is a fatal error and you should close the producer.

Does that mean I need to use a transactional producer to avoid this issue?

KafkaProducer doc states something that makes the above statement ambiguous: https://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html

To enable idempotence, the enable.idempotence configuration must be set to true. If set, the retries config will be defaulted to Integer.MAX_VALUE, the max.in.flight.requests.per.connection config will be defaulted to 1, and acks config will be defaulted to all. There are no API changes for the idempotent producer, so existing applications will not need to be modified to take advantage of this feature.

To take advantage of the idempotent producer, it is imperative to avoid application level re-sends since these cannot be de-duplicated. As such, if an application enables idempotence, it is recommended to leave the retries config unset, as it will be defaulted to Integer.MAX_VALUE. Additionally, if a send(ProducerRecord) returns an error even with infinite retries (for instance if the message expires in the buffer before being sent), then it is recommended to shut down the producer and check the contents of the last produced message to ensure that it is not duplicated. Finally, the producer can only guarantee idempotence for messages sent within a single session.

The above statement clearly states, all I need for an idempotent producer is to just use enable.idempotence property. However, the exception states that I have to use that transactional.id property.

What is the right way to create an idempotent async producer without having to deal with the fatal OutOfOrderSequenceException.

like image 973
Rakesh Subramanian S Avatar asked Mar 16 '19 02:03

Rakesh Subramanian S


2 Answers

It seems quite clear to me; from your second quote...

To take advantage of the idempotent producer, it is imperative to avoid application level re-sends since these cannot be de-duplicated. As such, if an application enables idempotence, it is recommended to leave the retries config unset, as it will be defaulted to Integer.MAX_VALUE.

And you have

props.put(ProducerConfig.RETRIES_CONFIG, 5);
like image 115
Gary Russell Avatar answered Sep 27 '22 18:09

Gary Russell


if you explicitly set the retries, then you must set

max.in.flight.requests.per.connection=1

in order to avoid outoforder issue

the doc states very clearly that:

Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error. Note that this retry is no different than if the client resent the record upon receiving the error. Allowing retries without setting MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION to 1 will potentially change the ordering of records because if two batches are sent to a single partition, and the first fails and is retried but the second succeeds, then the records in the second batch may appear first.

like image 29
LIU YUE Avatar answered Sep 27 '22 19:09

LIU YUE