Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka like offset on Kinesis Stream?

I have worked a bit with Kafka in the past and lately there is a requirement to port part of the data pipeline on AWS Kinesis Stream. Now I have read that Kinesis is effectively a fork of Kafka and share many similarities.

However I have failed to see how can we have multiple consumers reading from the same stream, each with their corresponding offset. There is a sequence number given to each data record, but I couldn't find anything specific to consumer(Kafka group Id?).

Is it really possible to have different consumers with different ingestion rate over same AWS Kinesis Stream?

like image 579
Mangat Rai Modi Avatar asked Mar 16 '17 04:03

Mangat Rai Modi


People also ask

Is AWS Kinesis like Kafka?

Like Apache Kafka, Amazon Kinesis is also a publish and subscribe (pub/sub) messaging solution. However, it is offered as a managed service in the AWS cloud, and unlike Kafka cannot be run on-premises. The Kinesis Producer continuously pushes data to Kinesis Streams.

What is the equivalent of Kafka in AWS?

AWS offers Amazon Kinesis Data Streams, a Kafka alternative that is fully managed. Running your Kafka deployment on Amazon EC2 provides a high performance, scalable solution for ingesting streaming data.

Is Kinesis better than Kafka?

Performance-wise, Kafka has a clear advantage over Kinesis. Let's not forget that Kafka consistently gets better throughput than Kinesis. Kafka can reach a throughput of 30k messages per second, whereas the throughput of Kinesis is much lower, but still solidly in the thousands.

Why Amazon Kinesis is the best Kafka alternative?

This highly scalable platform can process data from various sources with low latency. Known for its speed, ease of use, reliability, and capability of cross-platform replication, Amazon Kinesis is one of the most popular Kafka Alternatives.

What are the alternatives to Kafka?

Below is a comprehensive list of top Kafka Alternatives that can be used to manage real-time data feeds while maintaining low latency and high throughput: Amazon Kinesis streams, collects, processes, and analyzes video and data streams in real-time. It provides timely information and allows for complete flexibility and scalability.

What is the difference between Kafka and a streaming platform?

A streaming platform basically has three major roles, viz. Publish and subscribe to streams of records as they occur in real-time, store them and process them in real-time as they occur. Kafka has a basic structure of a Producer, Kafka Clusters (Stream Processors and Connectors) and Consumers.

How to scale effectively with Kafka without experiencing downtime?

You can scale effectively with Kafka without experiencing downtime. The retention policy for Kafka is configured by default to be seven days and can be changed as per user. Using Kafka, you can store your data for Short Periods before erasing the oldest values. Kafka MirrorMaker supports cluster replication.


1 Answers

Yes.

You can have multiple Kinesis Consumer Applications. Let's say you have 2.

  1. First consumer application (I think it is "consumer group" in Kafka?) can be "first-app" and store it's positions in the DynamoDB "first-app-table". It can have as many nodes (ec2 instances) as you want.
  2. Second consumer application can also work on the same stream, and store it's positions on another DynamoDB table let's say "second-app-table".

Each table will contain "what is the last processed position on shard X for app Y" information. So the 2 applications store checkpoints for the same shards in a different place, which makes them independent.

About the ingestion rate, there is a "idleTimeBetweenReadsInMillis" value in consumer applications using KCL, that is the polling interval for Amazon Kinesis API for Get operations. For example first application can have "2000" poll interval, so it will poll stream's shards every 2 seconds to see if any new record came.

I don't know Kafka well but as far as I remember; Kafka "partition" is "shard" in Kinesis, likewise Kafka "offset" is "sequence number" in Kinesis. Kinesis Consumer Library uses the term "checkpoint" for the stored sequences. Like you said, the concepts are similar.

like image 63
az3 Avatar answered Oct 28 '22 23:10

az3