I have worked a bit with Kafka in the past and lately there is a requirement to port part of the data pipeline on AWS Kinesis Stream. Now I have read that Kinesis is effectively a fork of Kafka and share many similarities.
However I have failed to see how can we have multiple consumers reading from the same stream, each with their corresponding offset. There is a sequence number given to each data record, but I couldn't find anything specific to consumer(Kafka group Id?).
Is it really possible to have different consumers with different ingestion rate over same AWS Kinesis Stream?
Like Apache Kafka, Amazon Kinesis is also a publish and subscribe (pub/sub) messaging solution. However, it is offered as a managed service in the AWS cloud, and unlike Kafka cannot be run on-premises. The Kinesis Producer continuously pushes data to Kinesis Streams.
AWS offers Amazon Kinesis Data Streams, a Kafka alternative that is fully managed. Running your Kafka deployment on Amazon EC2 provides a high performance, scalable solution for ingesting streaming data.
Performance-wise, Kafka has a clear advantage over Kinesis. Let's not forget that Kafka consistently gets better throughput than Kinesis. Kafka can reach a throughput of 30k messages per second, whereas the throughput of Kinesis is much lower, but still solidly in the thousands.
This highly scalable platform can process data from various sources with low latency. Known for its speed, ease of use, reliability, and capability of cross-platform replication, Amazon Kinesis is one of the most popular Kafka Alternatives.
Below is a comprehensive list of top Kafka Alternatives that can be used to manage real-time data feeds while maintaining low latency and high throughput: Amazon Kinesis streams, collects, processes, and analyzes video and data streams in real-time. It provides timely information and allows for complete flexibility and scalability.
A streaming platform basically has three major roles, viz. Publish and subscribe to streams of records as they occur in real-time, store them and process them in real-time as they occur. Kafka has a basic structure of a Producer, Kafka Clusters (Stream Processors and Connectors) and Consumers.
You can scale effectively with Kafka without experiencing downtime. The retention policy for Kafka is configured by default to be seven days and can be changed as per user. Using Kafka, you can store your data for Short Periods before erasing the oldest values. Kafka MirrorMaker supports cluster replication.
Yes.
You can have multiple Kinesis Consumer Applications. Let's say you have 2.
Each table will contain "what is the last processed position on shard X for app Y" information. So the 2 applications store checkpoints for the same shards in a different place, which makes them independent.
About the ingestion rate, there is a "idleTimeBetweenReadsInMillis" value in consumer applications using KCL, that is the polling interval for Amazon Kinesis API for Get operations. For example first application can have "2000" poll interval, so it will poll stream's shards every 2 seconds to see if any new record came.
I don't know Kafka well but as far as I remember; Kafka "partition" is "shard" in Kinesis, likewise Kafka "offset" is "sequence number" in Kinesis. Kinesis Consumer Library uses the term "checkpoint" for the stored sequences. Like you said, the concepts are similar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With