Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I am evaluating Google Pub/Sub vs Kafka. What are the differences? [closed]

I have not worked on kafka much but wanted to build data pipeline in GCE. So we wanted to know Kafka vs PUB/Sub. Basically I want to know how message consistency, message availability, message reliability is maintained in both Kafka and Pub/sub

Thanks

like image 496
Naresh Avatar asked Jul 25 '16 15:07

Naresh


People also ask

What is difference between Pubsub and Kafka?

Pub/Sub is a managed service, so it scales up and down with demand, whereas Kafka cluster configurations are typically self-managed and therefore static.

What protocol does Google Pub/Sub use?

Set up your app: Before you can make requests to the Google Cloud Pub/Sub API, your application must set up authorization, using the OAuth 2.0 protocol. If you are using the Google Cloud Pub/Sub client library, you must also create an instance of the Pubsub class.

Is Google Pubsub a message queue?

Google Cloud Pub/Sub is a scalable, durable event ingestion and message delivery system that allows you to create an infrastructure whose responsibility is to handle message queues. Pub/Sub delivers low-latency, durable messaging by using two core components: topics and subscriptions.


1 Answers

In addition to Google Pub/Sub being managed by Google and Kafka being open source, the other difference is that Google Pub/Sub is a message queue (e.g. Rabbit MQ) where as Kafka is more of a streaming log. You can't "re-read" or "replay" messages with Pubsub. (EDIT - as of 2019 Feb, you CAN replay messages and seek backwards in time to a certain timestamp, per comment below)

With Google Pub/Sub, once a message is read out of a subscription and ACKed, it's gone. In order to have more copies of a message to be read by different readers, you "fan-out" the topic by creating "subscriptions" for that topic, where each subscription will have an entire copy of everything that goes into the topic. But this also increases cost because Google charges Pub/Sub usage by the amount of data read out of it.

With Kafka, you set a retention period (I think it's 7 days by default) and the messages stay in Kafka regardless of how many consumers read it. You can add a new consumer (aka subscriber), and have it start consuming from the front of the topic any time you want. You can also set the retention period to be infinite, and then you can basically use Kafka as an immutable datastore, as described here: http://stackoverflow.com/a/22597637/304262

Amazon AWS Kinesis is a managed version of Kafka whereas I think of Google Pubsub as a managed version of Rabbit MQ. Amazon SNS with SQS is also similar to Google Pubsub (SNS provides the fanout and SQS provides the queueing).

like image 68
gunit Avatar answered Sep 23 '22 23:09

gunit