Ensuring that all messages have been read from Kafka topic using REST Proxy

Tags:

I'm new to Kafka, and our team is investigating patterns for inter-service communication.

The goal

We have two services, P (Producer) and C (Consumer). P is the source of truth for a set of data that C needs. When C starts up it needs to load all of the current data from P into its cache, and then subscribe to change notifications. (In other words, we want to synchronize data between the services.)

The total amount of data is relatively low, and changes are infrequent. A brief delay in synchronization is acceptable (eventual consistency).

We want to decouple the services so that P and C do not need to know about each other.

The proposal

When P starts up, it publishes all of its data to a Kafka topic that has log compaction enabled. Each message is an aggregate with a key of its ID.

When C starts up, it reads all of the messages from the beginning of the topic and populates its cache. It then keeps reading from its offset to be notified of updates.

When P updates its data, it publishes a message for the aggregate that changed. (This message has the same schema as the original messages.)

When C receives a new message, it updates the corresponding data in its cache.

enter image description here

Constraints

We are using the Confluent REST Proxy to communicate with Kafka.

The issue

When C starts up, how does it know when it's read all of the messages from the topic so that it can safely start processing?

It's acceptable if C does not immediately notice a message that P sent a second ago. It's not acceptable if C starts processing before consuming a message that P sent an hour ago. Note that we don't know when updates to P's data will occur.

We do not want C to have to wait for the REST Proxy's poll interval after consuming each message.

734

asked Jul 26 '19 14:07

TrueWill

1 Answers

If you would like to find the end partitions of a consumer group, in order to know when you've gotten all data at a point in time, you can use

POST /consumers/(string: group_name)/instances/(string: instance)/positions/end

Note that you must do a poll (GET /consumers/.../records) before that seek, but you don't need to commit.

If you don't want to affect the offsets of your existing consumer group, you would have to post a separate one.

You can then query offsets with

GET /consumers/(string: group_name)/instances/(string: instance)/offsets

Note that there might be data being written to the topic between calculating the end offsets and actually reaching the end, so you might want to have some additional settings to do a few more consumptions once you finally do reach the end.

answered Sep 27 '22 17:09

OneCricketeer

Related questions
                            
                                Find out Kafka version remotely
                            
                                Kafka streams shutting down and don't run
                            
                                Transactional Producer vs Just Idempotent Producer Java (Exception OutOfOrderSequenceException)
                            
                                Kafka Java consumer works only for localhost and fails for remote server
                            
                                Kafka - Simplest Way to Get Latest Offset
                            
                                Can I create an RDD from a kafka topic if I do not know the until offset?
                            
                                No Brokers Available error when trying to connect to Kafka
                            
                                Acknowledge within @KafkaListener-method without "losing" messages
                            
                                Kafka Stream API vs Consumer API
                            
                                Apache Camel Kafka - aggregate kafka messages and publish to a different topic at regular intervals
                            
                                Loading data from RDBMS to Hadoop with multiple destinations
                            
                                Structured Streaming - Foreach Sink
                            
                                Starting a Kafka topics using Docker Compose with spotify/kafka?
                            
                                Using Kafka to send batch emails
                            
                                Kafka Stream to sort messages based on timestamp key in json message
                            
                                Spring-Boot and Kafka : How to handle broker not available?
                            
                                Spring Kafka and exactly once delivery guarantee
                            
                                Python librdkafka producer perform against the native Apache Kafka Producer
                            
                                How to transfer data from S3 bucket to Kafka
                            
                                Kafka Stream Exception: GroupAuthorizationException

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Ensuring that all messages have been read from Kafka topic using REST Proxy

Tags:

apache-kafka

microservices

kafka-rest

TrueWill

People also ask

1 Answers

OneCricketeer

Recent Activity

Donate For Us