Apache Kafka Consumer group and Simple Consumer

Tags:

apache-kafka

I am new to Kafka, what I've understood sofar regarding the consumer is there are basically two types of implementation.
1) The High level consumer/consumer group
2) Simple Consumer

The most important part about the high level abstraction is it used when Kafka doesn't care about handling the offset while the Simple consumer provides much better control over the offset management. What confuse me is what if I want to run consumer in a multithreaded environment and also want to have control over the offset.If I use consumer group does that mean I must read from the last offset stored in zookeeper? is that the only option I have.

355

asked Jul 31 '13 19:07

Hild

2 Answers

For the most part, the high-level consumer API does not let you control the offset directly.

When the consumer group is first created, you can tell it whether to start with the oldest or newest message that kafka has stored using the auto.offset.reset property.

You can also control when the high-level consumer commits new offsets to zookeeper by setting auto.commit.enable to false.

Since the high-level consumer stores the offsets in zookeeper, your app could access zookeeper directly and manipulate the offsets - but it would be outside of the high-level consumer API.

Your question was a little confusing but you can use the simple consumer in a multi-threaded environment. That's what the high-level consumer does.

answered Nov 03 '22 00:11

Paul M

In Apache Kafka 0.9 and 0.10 the consumer group management is handled entirely within the Kafka application by a Broker (for coordination) and a topic (for state storage).

When a consumer group first subscribes to a topic the setting of auto.offset.reset determines where consumers begin to consume messages (http://kafka.apache.org/documentation.html#newconsumerconfigs)

You can register a ConsumerRebalanceListener to receive a notification when a particular consumer is assigned topics/partitions.

Once the consumer is running, you can use seek, seekToBeginning and seekToEnd to get messages from a specific offset. seek affects the next poll for that consumer, and is stored on the next commit (e.g. commitSync, commitAsync or when the auto.commit.interval elapses, if enabled.)

The consumer javadocs mention more specific situations: http://kafka.apache.org/0100/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html

You can combine the group management provided by Kafka with manual management of offsets via seek(..) once partitions are assigned.

answered Nov 03 '22 02:11

phaas

Related questions
                            
                                Kafka: No message seen on console consumer after message sent by Java Producer
                            
                                kafka on kubernetes cannot produce/consume topics (ClosedChannelException, ErrorLoggingCallback)
                            
                                Sending data with kafka-python only working when briefly delaying code
                            
                                Why do we need to mention Zookeeper details even though Apache Kafka configuration file already has it?
                            
                                Kafka cached zkVersion not equal to that in zookeeper broker not recovering
                            
                                Choosing the right cleanup policy in Kafka configuration
                            
                                How does Kinesis achieve Kafka style Consumer Groups?
                            
                                How to set auto.create.topics.enable as default config on AWS MSK cluster
                            
                                Kubernetes pod resolve external kafka hostname in coredns not as hostaliases inside pod
                            
                                Kafka - Stream vs Topic
                            
                                how to delete kafka message after reading
                            
                                How to make kafka-python or pykafka work as an async producer with uwsgi and gevent?
                            
                                Kafka uncommitted messages
                            
                                Can I retrieve the latest available offset for a Kafka partition without retrieving all the messages?
                            
                                Multithreaded Kafka Consumer or PerPartition-PerConsumer
                            
                                Failed to rebalance error in Kafka Streams with more than one topic partition
                            
                                Kafka: Is our number of partitions insane?
                            
                                How to fix kafka.common.errors.TimeoutException: Expiring 1 record(s) xxx ms has passed since batch creation plus linger time
                            
                                What is the optimal way to read from multiple Kafka topics and write to different sinks using Spark Structured Streaming?
                            
                                kafka consumer in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With