Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Number of commits and offset in each partition of a kafka topic

Tags:

How to find the number of commits and current offset in each partition of a known kafka topic. I am using kafka v0.8.1.1

like image 870
zero Avatar asked Dec 16 '14 07:12

zero


People also ask

Are Kafka offsets per partition?

Kafka maintains a numerical offset for each record in a partition. This offset acts as a unique identifier of a record within that partition, and also denotes the position of the consumer in the partition.

How many offset in Kafka?

Kafka maintains two types of offsets, the current and committed offset.

How does Kafka store offsets for each topic?

Kafka store the offset commits in a topic, when consumer commit the offset, kafka publish an commit offset message to an "commit-log" topic and keep an in-memory structure that mapped group/topic/partition to the latest offset for fast retrieval.

What is Kafka commit offset?

Apache Kafka Offset Commit activity notifies Kafka Consumer Trigger to commit given offset. This is useful in case you want offsets to be committed as soon as the record is processed in the flow. By default, offsets are committed only when flow is successfully executed.


1 Answers

It is not clear from your question, what kind of offset you're interested in. There are actually three types of offsets:

  1. The offset of the first available message in topic's partition. Use -2 (earliest) as --time parameter for GetOffsetShell tool
  2. The offset of the last available message in topic's partition. Use -1(latest) as --time parameter.
  3. The last read/processed message offset maintained by kafka consumer. High level consumer stores this information, for every consumer group, in an internal Kafka topic (used to be Zookeeper) and takes care about keeping it up to date when you call commit() or when auto-commit setting is set to true. For simple consumer, your code have to take care about managing offsets.

In addition to command line utility, the offset information for #1 and #2 is also available via SimpleConsumer.earliestOrLatestOffset().

If the number of messages is not too large, you can specify a large --offsets parameter to GetOffsetShell and then count number of lines returned by the tool. Otherwise, you can write a simple loop in scala/java that would iterate all available offsets starting from the earliest.

From Kafka documentation:

Get Offset Shell get offsets for a topic bin/kafka-run-class.sh kafka.tools.GetOffsetShell  required argument [broker-list], [topic] Option Description  ------ -----------  --broker-list <hostname:port,..., REQUIRED: The list of hostname and hostname:port> port of the server to connect to.  --max-wait-ms <Integer: ms> The max amount of time each fetch request waits. (default: 1000)  --offsets <Integer: count> number of offsets returned (default: 1) --partitions <partition ids> comma separated list of partition ids. If not specified, will find offsets for all partitions (default)  --time <Long: timestamp in milliseconds / -1(latest) / -2 (earliest) timestamp; offsets will come before this timestamp, as in getOffsetsBefore  >  --topic <topic> REQUIRED: The topic to get offsets from. 
like image 146
Denis Makarenko Avatar answered Nov 22 '22 06:11

Denis Makarenko