Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ensuring that all messages have been read from Kafka topic using REST Proxy

I'm new to Kafka, and our team is investigating patterns for inter-service communication.

The goal

We have two services, P (Producer) and C (Consumer). P is the source of truth for a set of data that C needs. When C starts up it needs to load all of the current data from P into its cache, and then subscribe to change notifications. (In other words, we want to synchronize data between the services.)

The total amount of data is relatively low, and changes are infrequent. A brief delay in synchronization is acceptable (eventual consistency).

We want to decouple the services so that P and C do not need to know about each other.

The proposal

When P starts up, it publishes all of its data to a Kafka topic that has log compaction enabled. Each message is an aggregate with a key of its ID.

When C starts up, it reads all of the messages from the beginning of the topic and populates its cache. It then keeps reading from its offset to be notified of updates.

When P updates its data, it publishes a message for the aggregate that changed. (This message has the same schema as the original messages.)

When C receives a new message, it updates the corresponding data in its cache.

enter image description here

Constraints

We are using the Confluent REST Proxy to communicate with Kafka.

The issue

When C starts up, how does it know when it's read all of the messages from the topic so that it can safely start processing?

It's acceptable if C does not immediately notice a message that P sent a second ago. It's not acceptable if C starts processing before consuming a message that P sent an hour ago. Note that we don't know when updates to P's data will occur.

We do not want C to have to wait for the REST Proxy's poll interval after consuming each message.

like image 734
TrueWill Avatar asked Jul 26 '19 14:07

TrueWill


People also ask

What does Kafka rest proxy do?

The Kafka REST Proxy provides a RESTful interface to a Kafka cluster. It makes it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients.

Does Kafka have REST API?

The Kafka REST Proxy is a RESTful web API that allows your application to send and receive messages using HTTP rather than TCP. It can be used to produce data to and consume data from Kafka or for executing queries on cluster configuration.

Is Kafka rest proxy open source?

Of course it's open source and Apache 2.0 licensed. Show activity on this post. You can use the Confluent REST Proxy with no software/licensing costs.


1 Answers

If you would like to find the end partitions of a consumer group, in order to know when you've gotten all data at a point in time, you can use

POST /consumers/(string: group_name)/instances/(string: instance)/positions/end

Note that you must do a poll (GET /consumers/.../records) before that seek, but you don't need to commit.

If you don't want to affect the offsets of your existing consumer group, you would have to post a separate one.

You can then query offsets with

GET /consumers/(string: group_name)/instances/(string: instance)/offsets

Note that there might be data being written to the topic between calculating the end offsets and actually reaching the end, so you might want to have some additional settings to do a few more consumptions once you finally do reach the end.

like image 88
OneCricketeer Avatar answered Sep 27 '22 17:09

OneCricketeer