I have my project set up using Spring Boot and Spring Kafka, and there are three consumers. Checking the logs, I can see that from time to time the consumers get disconnected:
catalina.out:2019-04-27 02:19:57.962 INFO 18245 --- [ntainer#2-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-2, groupId=FalconDataRiver1] Error sending fetch request (sessionId=1338157432, epoch=205630) to node 101: org.apache.kafka.common.errors.DisconnectException.
catalina.out:2019-04-27 02:19:57.962 INFO 18245 --- [ntainer#4-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-6, groupId=FalconDataRiver1] Error sending fetch request (sessionId=727942178, epoch=234691) to node 101: org.apache.kafka.common.errors.DisconnectException.
catalina.out:2019-04-27 02:19:57.962 INFO 18245 --- [ntainer#0-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-10, groupId=FalconDataRiver1] Error sending fetch request (sessionId=836405004, epoch=234351) to node 101: org.apache.kafka.common.errors.DisconnectException.
catalina.out:2019-04-27 02:19:58.023 INFO 18245 --- [ntainer#1-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-12, groupId=FalconDataRiver1] Error sending fetch request (sessionId=1385585601, epoch=234394) to node 101: org.apache.kafka.common.errors.DisconnectException.
catalina.out:2019-04-27 02:19:58.023 INFO 18245 --- [ntainer#3-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-4, groupId=FalconDataRiver1] Error sending fetch request (sessionId=452630289, epoch=201944) to node 101: org.apache.kafka.common.errors.DisconnectException.
catalina.out:2019-04-27 02:19:58.023 INFO 18245 --- [ntainer#5-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-8, groupId=FalconDataRiver1] Error sending fetch request (sessionId=78802572, epoch=103) to node 101: org.apache.kafka.common.errors.DisconnectException.
catalina.out:2019-04-27 02:19:58.040 INFO 18245 --- [ntainer#2-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-2, groupId=FalconDataRiver1] Error sending fetch request (sessionId=1338157432, epoch=INITIAL) to node 101: org.apache.kafka.common.errors.DisconnectException.
I haven't configured the consumers in terms of reconnection. I know that there are two properties from the Kafka documentation:
reconnect.backoff.ms
-- The maximum amount of time in milliseconds to wait when reconnecting to a broker that has repeatedly failed to connect. If provided, the backoff per host will increase exponentially for each consecutive connection failure, up to this maximum. After calculating the backoff increase, 20% random jitter is added to avoid connection storms. Default value 1000 milliseconds)
reconnect.backoff.ms
-- The base amount of time to wait before attempting to reconnect to a given host. This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all connection attempts by the client to a broker. Default value 50 milliseconds)
I can see the three consumers are still consuming after the above logging messages. Obviously they have recovered from these disconnect exceptions. What bothers me is that there is nothing in the logs that record the process of reconnecting and recovery.
Am I missing something here? Thanks!
If the consumer crashes or is shut down, its partitions will be re-assigned to another member, which will begin consumption from the last committed offset of each partition. If the consumer crashes before any offset has been committed, then the consumer which takes over its partitions will use the reset policy.
Processing Guarantees 0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics.
Initially, Kafka only supported at-most-once and at-least-once message delivery. However, the introduction of Transactions between Kafka brokers and client applications ensures exactly-once delivery in Kafka.
Kafka recovers from this internal error automatically and this is why the level of the log is INFO
. Evidently, your consumers are still able to consume the messages.
Switch log level to DEBUG
in case you want to get more information about what is causing this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With