How to find the number of commits and current offset in each partition of a known kafka topic. I am using kafka v0.8.1.1
Kafka maintains a numerical offset for each record in a partition. This offset acts as a unique identifier of a record within that partition, and also denotes the position of the consumer in the partition.
Kafka maintains two types of offsets, the current and committed offset.
Kafka store the offset commits in a topic, when consumer commit the offset, kafka publish an commit offset message to an "commit-log" topic and keep an in-memory structure that mapped group/topic/partition to the latest offset for fast retrieval.
Apache Kafka Offset Commit activity notifies Kafka Consumer Trigger to commit given offset. This is useful in case you want offsets to be committed as soon as the record is processed in the flow. By default, offsets are committed only when flow is successfully executed.
It is not clear from your question, what kind of offset you're interested in. There are actually three types of offsets:
In addition to command line utility, the offset information for #1 and #2 is also available via SimpleConsumer.earliestOrLatestOffset().
If the number of messages is not too large, you can specify a large --offsets parameter to GetOffsetShell and then count number of lines returned by the tool. Otherwise, you can write a simple loop in scala/java that would iterate all available offsets starting from the earliest.
From Kafka documentation:
Get Offset Shell get offsets for a topic bin/kafka-run-class.sh kafka.tools.GetOffsetShell required argument [broker-list], [topic] Option Description ------ ----------- --broker-list <hostname:port,..., REQUIRED: The list of hostname and hostname:port> port of the server to connect to. --max-wait-ms <Integer: ms> The max amount of time each fetch request waits. (default: 1000) --offsets <Integer: count> number of offsets returned (default: 1) --partitions <partition ids> comma separated list of partition ids. If not specified, will find offsets for all partitions (default) --time <Long: timestamp in milliseconds / -1(latest) / -2 (earliest) timestamp; offsets will come before this timestamp, as in getOffsetsBefore > --topic <topic> REQUIRED: The topic to get offsets from.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With