Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equivalent for Kafka / AWS Kinesis Stream on Google Cloud Platform

Tags:

I'm building an app that is constantly appending to a buffer while many readers consume from this buffer independently (write-once-read-many / WORM). At first I thought of using Apache Kafka, but as I prefer an as-a-service option I started investigating AWS Kinesis Streams + KCL and it seems I can accomplish this task with them.

Basically I need 2 features: ordering (the events must be read in the same order by all readers) and the ability to choose the offset in the buffer from where the reader starts consuming onwards.

Now I'm also evaluating Google Cloud Platform. As I am reading the documentation it seems that Google Pub/Sub is suggested as the equivalent to AWS Kinesis Stream, but at a more detailed level these products seem a lot different:

  • Kinesis guarantees ordering inside a shard, while on Pub/Sub ordering is on a best-effort basis;
  • Kinesis has all the buffer (limited to max 7 days) available to readers, which can use an offset to select the starting reading position, while on PubSub only the messages after the subscription are available for consuption.

If I got it right, PubSub cannot be considered a Kinesis equivalent. Perhaps if used together with Google Dataflow? I must confess that I still can't see how.

So, is PubSub an alternative to Kinesis? If not, is there a Google Cloud Product that would fulfill my requirements?

Thanks!

like image 756
Renan Avatar asked Sep 11 '17 20:09

Renan


People also ask

What is the AWS equivalent of Kafka?

Like Apache Kafka, Amazon Kinesis is also a publish and subscribe (pub/sub) messaging solution. However, it is offered as a managed service in the AWS cloud, and unlike Kafka cannot be run on-premises. The Kinesis Producer continuously pushes data to Kinesis Streams.

Does AWS Kinesis use Kafka?

Like many of the offerings from Amazon Web Services, Amazon Kinesis software is modeled after an existing Open Source system. In this case, Kinesis is modeled after Apache Kafka. Kinesis is known to be incredibly fast, reliable and easy to operate.

Which is better Kafka or Kinesis?

Performance-wise, Kafka has a clear advantage over Kinesis. Let's not forget that Kafka consistently gets better throughput than Kinesis. Kafka can reach a throughput of 30k messages per second, whereas the throughput of Kinesis is much lower, but still solidly in the thousands.

What is azure equivalent of AWS Kinesis?

Azure Event Hubs and Amazon Kinesis are two competing cloud services that serve the same purpose – reliably collect and process massive amounts of data with low latency and at low cost. Although both services provide similar functionality, there are significant differences to be aware of when architecting a solution.


2 Answers

A rather convoluted solution but it might help:

  • push your events using pub/sub to a single topic. At this point they will be unordered.
  • create a cloud dataflow streaming pipeline that reads from the pub/sub topic. Have it do streaming writes to cloud bigquery, add a timestamp to each table entry.
  • have you readers do queries on the bq table, order by timestamp to have a consistent order. You can use ROW_NUMBER as your offset.

Hope that helps.

like image 63
HJED Avatar answered Sep 21 '22 13:09

HJED


Pub/Sub now supports ordering natively. As for the requirement that a subscription (~consumer group in Kafka) exist before you consume, it's very rarely a problem for users. If nothing else, you can create snapshots which allow you to reset a new subscription to the state of any other existing subscription.

This is a bit late, but @Renan, if you are still watching would love to hear how you ended up building your system.

like image 25
Kir Titievsky Avatar answered Sep 24 '22 13:09

Kir Titievsky