Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Kinesis achieve Kafka style Consumer Groups?

In Kafka, I can split my topic into many partitions. I cannot have more consumers than partitions in Kafka, because the partition is used as a way to scale out a topic. If I have more load, I can increase the number of partitions, which will allow me to increase the number of consumers, which will allow me to have more threads / processes processing on a given topic.

In Kafka, there is a concept of a Consumer Group. If we have 10 consumer groups on a single topic, each consumer group will have the opportunity to process every message in a topic. The consumer group still takes advantage of the scalability from the partitions (i.e. Each consumer group can have up to 'n' consumers, where 'n' is the number of partitions on a topic). This is the beauty of kafka, scalability and multi-channel reading are two separate concepts with two separate knobs to turn.

In Kinesis, we are told that, if you use the Kinesis Library Client you can get the same functionality as consumer groups by defining different Kinesis Applications. In other words, we can have different Kinesis Applications independently streaming all records from the same stream and different times.

We are also told that "Amazon Kinesis Client Library (KCL) automatically creates an Amazon DynamoDB table for each Amazon Kinesis Application to track and maintain state information such as resharding events and sequence number checkpoints."

OK, So I'm getting ready to start reading through the KCL code here, but I'm hoping someone can answer these questions to save me some time.

  1. How does the KCL actually do this?
  2. Are there diagrams somewhere explaining the process?
  3. If I started a new Kinesis Application (MyKinesisApp1) after a record was already produced and consumed by all prior Kinesis Applications, will the new Kinesis Application (MyKinesisApp1) still have an opportunity to consume that record? In other words, does Kinesis remove the record from its stream after it has been processed, or does it leave it there for the 7 days no matter what?

I have seen this question here but it doesn't answer my question. Especially my third question! Also, this question does a direct comparison between two similar technologies. It will help people that know Kafka, learn Kinesis more quickly.

like image 504
CBP Avatar asked May 05 '18 14:05

CBP


People also ask

Does Kinesis use Kafka?

Kinesis Data Streams is a proprietary product developed by AWS and is not based on open-source Apache Kafka.

How does Kafka consumer group work?

Kafka assigns the partitions of a topic to the consumer in a group, so that each partition is consumed by exactly one consumer in the group. Kafka guarantees that a message is only ever read by a single consumer in the group. Consumers can see the message in the order they were stored in the log.

How is Kinesis different from Kafka?

Kafka is more highly configurable compared to Kinesis. With Kafka, it's possible to write data to a single server. On the other hand, Kinesis is designed to write simultaneously to three servers – a constraint that makes Kafka a better performing solution.

Does Kinesis support multiple consumers?

A consumer is an application that processes all data from a Kinesis data stream. When a consumer uses enhanced fan-out, it gets its own 2 MB/sec allotment of read throughput, allowing multiple consumers to read data from the same stream in parallel, without contending for read throughput with other consumers.


1 Answers

  1. In the KCL configuration, there is a section "appName" which corresponds to "Application Name" and that is the same as "consumer group" in Kafka. For each consumer group (ie. Kinesis Streams Consumer Application) there is a DynamoDB table. You can see an example DynamoDB here (the KCL appName is 'quickstats-development'): AWS Kinesis leaseOwner confusion

  2. No, as far as I know, there is not. "Kinesis Streams" is similar to Kafka, but other than that, not much graphical representation.

  3. Yes. Each Kafka Consumer-Group is represented as a different DynamoDB table in Kinesis. That way, different Kinesis Consumer Applications can consume same record independently. The checkpoint in Kinesis is the Offset value of Kafka. And a checkpoint in DynamoDB is the cursor of reading point in a Kinesis shard. Read this answer for a similar example: https://stackoverflow.com/a/42833193/1622134

like image 92
az3 Avatar answered Nov 15 '22 11:11

az3