Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka or SNS or something else? [closed]

Sorry if it is a newbie question. But I'm trying to understand what should I use. As far as I understand Kafka is :

Apache Kafka is a distributed publish-subscribe messaging system.

And SNS is also pub/sub system.

My goal is to use some queue messaging system on AWS with application that will be distributed over few servers (By the way the main language is Python). And because it is on amazon, my first thought was to use SNS and SQS. But then I saw a lot of people using Kafka on AWS. What are the advantages of one over another?

like image 831
Vor Avatar asked May 08 '13 19:05

Vor


People also ask

Is SNS same as Kafka?

The use-cases for Kafka and Amazon SQS/Amazon SNS are quite different. Kafka, as you wrote, is a distributed publish-subscribe system. It is designed for very high throughput, processing thousands of messages per second. Of course you need to setup and cluster it for yourself.

What is the difference between SQS and Kafka?

kafka is Apache product and SQS is Amazon product, high level they both are used to store data for a defined time.

Is SQS better than Kafka?

With SQS, you can offload the administrative burden of operating and scaling a highly available messaging cluster, while paying a low price for only what you use. On the other hand, Kafka is detailed as "Distributed, fault tolerant, high throughput pub-sub messaging system".

What is the equivalent of Kafka in AWS?

AWS offers Amazon Kinesis Data Streams, a Kafka alternative that is fully managed. Running your Kafka deployment on Amazon EC2 provides a high performance, scalable solution for ingesting streaming data. AWS offers many different instance types and storage option combinations for Kafka deployments.


2 Answers

The use-cases for Kafka and Amazon SQS/Amazon SNS are quite different.

Kafka, as you wrote, is a distributed publish-subscribe system. It is designed for very high throughput, processing thousands of messages per second. Of course you need to setup and cluster it for yourself. It supports multiple readers, which may "catch up" with the stream of messages at any point (well, as long as the messages are still on disk). You can use it both as a queue (using consumer groups) and as a topic.

An important characteristic is that you cannot selectively acknowledge messages as "processed"; the only option is acknowledging all messages up to a certain offset.

SQS/SNS on the other hand:

  • no setup/no maintenance
  • either a queue (SQS) or a topic (SNS)
  • various limitations (on size, how long a message lives, etc)
  • limited throughput: you can do batch and concurrent requests, but still achieving high throughputs would be expensive
  • I'm not sure if the messages are replicated; however at-least-once guarantee delivery in SQS would suggest so
  • SNS has notifications for email, SMS, SQS, HTTP built-in. With Kafka, you would probably have to code it yourself
  • no "message stream" concept

So overall I would say SQS/SNS are well suited for simpler tasks and workloads with a lower volume of messages.

like image 71
adamw Avatar answered Sep 22 '22 15:09

adamw


This is a classic trade-off:

AWS tools (SQS, SNS)

These will be easier for you to setup, and integrate with the rest of your architecture, especially if most of it is already running on AWS. It will also probably be cheaper at first, since they have a good pay as you go model, but the cost will not scale as well, so you have to think about that.

Apache Kafka

Here, you're using a highly popular (not trendy) distributed (this is important if you think you will scale a lot) PUB/SUB model. Nowadays, this model seems to be much preferred, since running analytics on the data going through the pipes is very common, and usually with an SOA architecture you can have a multitude of small services consuming the messages and doing their thing, without having the data be removed from the queue. You also get a lot of configuration options, so depending on your use case you can fine tune it to your needs. This means more work, but a more optimized service down the road.

Summary

This is a classic trade-off of speed of development and ease of development vs the best, very modular and personalized solution, that has more overhead for the first implementation but scales better.

Personal Advice

If you are prototyping something, favor speed of development, so AWS tools. If your requirements are frozen and require significant scale, definitely take the time to use kafka. I also am a big believer in using-open-source-makes-the-world-better, but that's not the biggest argument to use.

like image 39
nichochar Avatar answered Sep 22 '22 15:09

nichochar