Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What Kafka broker metrics should be monitored if producer side ack lag is very high

Are there some broker metrics we can use to monitor Kafka broker if acknowledgment lag is very high in the producer side.

We are using datadog to monitor producer and Kafka broker side. It can be seen that the producer ack lag is more than 10 secs. However, on the broker side, I feel like only using message.in.rate and kafka.net.bytes_in.rate are not very efficient. It would be better we can have some LAG metrics in the broker side to indicate the broker is fully loaded to acknowledge back the producer.

Also, we only use kafka.acks = 1 for partition leader.

I wonder does anyone has some experience about it and any advice is welcome. :) Thanks in advance.

like image 923
Xiaohe Dong Avatar asked Apr 30 '18 08:04

Xiaohe Dong


1 Answers

I'm guessing you're talking about "metrics" instead of matrix!

On the Producer, you have kafka.producer:type=producer-metrics,client-id="{client-id}". That metric has 2 interesting attributes:

  • request-latency-avg: The average request latency in ms

  • request-latency-max: The maximum request latency in ms

On the broker side, there are a few metrics you want to check to investigate your issue:

  • Message conversion time: Down convertion happens if the producer is using a older message format than the broker. kafka.network:type=RequestMetrics,name=MessageConversionsTimeMs,request=Produce
  • Request total time: Total time Kafka took to process the request. kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce

    In case this is high, you can check the break down metrics:

    • Time the request waits in the request queue: kafka.network:type=RequestMetrics,name=RequestQueueTimeMs,request=Produce
    • Time the request is processed at the leader: kafka.network:type=RequestMetrics,name=LocalTimeMs,request=Produce
    • Time the request waits in the response queue: kafka.network:type=RequestMetrics,name=ResponseQueueTimeMs,request={Produce|FetchConsumer|FetchFollower}
    • Time to send the response: kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce

These are all listed in the metrics recommended to monitor list in the Kafka documentation: http://kafka.apache.org/documentation/#monitoring

like image 176
Mickael Maison Avatar answered Nov 02 '22 01:11

Mickael Maison