Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to monitor consumer lag in kafka via jmx?

I have a kafka setup that includes a jmx exporter to prometheus. I'm looking for a metric, that gives the offset lag based on topic and groupid. I'm running kafka 2.2.0.

Some resources online point to a metric called kafka.consumer, but I have no such metric in my setup.

From my jmxterminal:

$>domains
#following domains are available
JMImplementation
com.sun.management
java.lang
java.nio
java.util.logging
jdk.management.jfr
kafka
kafka.cluster
kafka.controller
kafka.coordinator.group
kafka.coordinator.transaction
kafka.log
kafka.network
kafka.server
kafka.utils

I am, however, able to see the data I need by using the following command:

root@kafka-0:/kafka# bin/kafka-consumer-groups.sh --describe --group benchmark_consumer_group --bootstrap-server localhost:9092
Consumer group 'benchmark_consumer_group' has no active members.

TOPIC               PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
benchmark_topic_10B 2          2795128         54223220        51428092        -               -               -
benchmark_topic_10B 9          4               4               0               -               -               -
benchmark_topic_10B 6          7               7               0               -               -               -
benchmark_topic_10B 7          5               5               0               -               -               -
benchmark_topic_10B 0          2834028         54224939        51390911        -               -               -
benchmark_topic_10B 1          15342331        54222342        38880011        -               -               -
benchmark_topic_10B 4          5               5               0               -               -               -
benchmark_topic_10B 5          6               6               0               -               -               -
benchmark_topic_10B 8          8               8               0               -               -               -
benchmark_topic_10B 3          4               4               0               -               -               -


But that does not help since I need to track if from a metric. Also, this command takes about 25 seconds to execute, making it unreasonable to use as a source for metrics.

My guess is that the metric kafka.consumer does not exist in version 2.2.0 and was replaced with another. Although, I can't find any resources online with up-to-date information on how and where to get that metric

like image 886
Tom Klino Avatar asked Apr 07 '19 14:04

Tom Klino


People also ask

How do I know if my Kafka is lagging?

The simplest way to check the offsets and lag of a given consumer group is by using the CLI tools provided with Kafka. In the diagram above, you can see the details on a consumer group called my-group . The command output shows the details per partition within the topic.


2 Answers

You can give Kafka Minion ( https://github.com/cloudworkz/kafka-minion ) a try. While Kafka Minion internally works similiarly as Burrow (consumes __consumer_offsets topic for Consumer Group Offsets) it has several advantages for your use case

Advantages of Kafka Minion over Burrow for your case:

  • Has native prometheus support (no additional deployment necessary to just expose metrics to prometheus)
  • Has a sample Grafana dashboard
  • Has additional metrics (such as last commit timestamp for a consumergroup:topic:partition combination, commitrates, info about cleanup policy, you can list all consumer groups for a given topic, etc)
  • No zookeeper dependency included (which also means that consumers who still commit offsets to zookeeper are not supported)
  • High Availability support (!!). Burrow has the problem that it will always expose metrics, which will be wrong when it just has started consuming the __consumer_offsets topic. Therefore you cannot run it in a HA mode. This is a problem when you want to setup alerts based on consumer group lags
  • Kafka Minion does not support multiple clusters, which reduces complexity in code and as enduser. You can obviously still deploy Kafka Minion per cluster

Disclaimer: I am the author of Kafka Minion, and I am still looking for more feedback from other users. I intend to actively maintain and develop the exporter for my projects, the company I am working for and for the community.

To answer your question regarding what you are seeing using the kafka-consumer-groups.sh shell script. This won't work as it cannot report lags for inactive consumers which is a bit counterproductive.

like image 71
kentor Avatar answered Nov 06 '22 00:11

kentor


The kafka.consumer JMX metrics are only present on the consumer processes themselves, not on the Kafka broker processes. Note that you would not get the kafka.consumer metric from consumers using a consumer library other than the Java one.

Currently, there are no available JMX metrics for consumer lag from the Kafka broker itself. There are other solutions that are commonly used for monitoring consumer lag, such as Burrow by LinkedIn. There are also a few open source projects such as kafka9.offsets that expose consumer lag metrics via JMX, but may not be updated to work with the latest Kafka.

like image 21
devshawn Avatar answered Nov 06 '22 00:11

devshawn