Is it possible to consume messages from Kafka based on a time period in which the messages were ingested?
Example: I want all messages ingested to a topic between 0900-1000 today (and now it's 1200).
If there is only a way to specify a start time, that's fine - my consumer can stop processing messages once it reaches the end time.
I can see methods for requesting messages from a given offset, and for getting the first available offset, and for the earliest available offset, but not all messages after a given time.
Kafka Streams assigns a timestamp to every data record via so-called timestamp extractors. These per-record timestamps describe the progress of a stream with regards to time (although records may be out-of-order within the stream) and are leveraged by time-dependent operations such as joins.
So the rule in Kafka is only one consumer in a consumer group can be assigned to consume messages from a partition in a topic and hence multiple Kafka consumers from a consumer group can not read the same message from a partition.
You can count the number of messages in a Kafka topic simply by consuming the entire topic and counting how many messages are read. To do this from the commandline you can use the kcat tool which can act as a consumer (and producer) and is built around the Unix philosophy of pipelines.
You could use the offsetsForTimes
method which returns you offset whose timestamp is greater or equal to the give timestamp.
More information on the official doc here :
https://kafka.apache.org/0101/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#offsetsForTimes(java.util.Map)
After getting the offset you can seek using it and starting to read from there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With