Sorry if it is a newbie question. But I'm trying to understand what should I use. As far as I understand Kafka is :
Apache Kafka is a distributed publish-subscribe messaging system.
And SNS is also pub/sub system.
My goal is to use some queue messaging system on AWS with application that will be distributed over few servers (By the way the main language is Python). And because it is on amazon, my first thought was to use SNS and SQS. But then I saw a lot of people using Kafka on AWS. What are the advantages of one over another?
The use-cases for Kafka and Amazon SQS/Amazon SNS are quite different. Kafka, as you wrote, is a distributed publish-subscribe system. It is designed for very high throughput, processing thousands of messages per second. Of course you need to setup and cluster it for yourself.
kafka is Apache product and SQS is Amazon product, high level they both are used to store data for a defined time.
With SQS, you can offload the administrative burden of operating and scaling a highly available messaging cluster, while paying a low price for only what you use. On the other hand, Kafka is detailed as "Distributed, fault tolerant, high throughput pub-sub messaging system".
AWS offers Amazon Kinesis Data Streams, a Kafka alternative that is fully managed. Running your Kafka deployment on Amazon EC2 provides a high performance, scalable solution for ingesting streaming data. AWS offers many different instance types and storage option combinations for Kafka deployments.
The use-cases for Kafka and Amazon SQS/Amazon SNS are quite different.
Kafka, as you wrote, is a distributed publish-subscribe system. It is designed for very high throughput, processing thousands of messages per second. Of course you need to setup and cluster it for yourself. It supports multiple readers, which may "catch up" with the stream of messages at any point (well, as long as the messages are still on disk). You can use it both as a queue (using consumer groups) and as a topic.
An important characteristic is that you cannot selectively acknowledge messages as "processed"; the only option is acknowledging all messages up to a certain offset.
SQS/SNS on the other hand:
So overall I would say SQS/SNS are well suited for simpler tasks and workloads with a lower volume of messages.
This is a classic trade-off:
These will be easier for you to setup, and integrate with the rest of your architecture, especially if most of it is already running on AWS. It will also probably be cheaper at first, since they have a good pay as you go model, but the cost will not scale as well, so you have to think about that.
Here, you're using a highly popular (not trendy) distributed (this is important if you think you will scale a lot) PUB/SUB model. Nowadays, this model seems to be much preferred, since running analytics on the data going through the pipes is very common, and usually with an SOA architecture you can have a multitude of small services consuming the messages and doing their thing, without having the data be removed from the queue. You also get a lot of configuration options, so depending on your use case you can fine tune it to your needs. This means more work, but a more optimized service down the road.
This is a classic trade-off of speed of development and ease of development vs the best, very modular and personalized solution, that has more overhead for the first implementation but scales better.
If you are prototyping something, favor speed of development, so AWS tools. If your requirements are frozen and require significant scale, definitely take the time to use kafka. I also am a big believer in using-open-source-makes-the-world-better, but that's not the biggest argument to use.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With