Yahoo developed Pulsar, pub-sub messaging system and made it open source. Its now Apache's incubating project. Since Kafka is also used for same purpose. Want to know, major plus and minus points of Kafka over Pulsar.
What is the difference between Apache Pulsar and Kafka?
In Kafka, each broker uses a complete log for its partitions. These brokers need to synchronize data with all the other brokers for the same partition as well as their replicas. Pulsar, on the other hand, stores the state outside of the brokers which separates them completely from the data storage layer.
What are disadvantages of Kafka?
Disadvantages Of Apache KafkaDo not have complete set of monitoring tools: Apache Kafka does not contain a complete set of monitoring as well as managing tools. Thus, new startups or enterprises fear to work with Kafka. Message tweaking issues: The Kafka broker uses system calls to deliver messages to the consumer.
What are advantages of Kafka over other messaging systems?
Kafka is Highly Reliable. Kafka replicates data and is able to support multiple subscribers. Additionally, it automatically balances consumers in the event of failure. That means that it's more reliable than similar messaging services available.
What is better than Kafka?
Apache Pulsar incorporates the best features of Traditional Messaging systems like RabbitMQ and Pub-sub (publish-subscribe) systems like Apache Kafka. With high performance, Cloud-native package, you get the best of both worlds.
I played a bit with both lately, and here is what I gathered.
Neutral:
- I was going to make Kafka win on the community/documentation etc. But I wasn't able to find replies to questions I had on Kafka easily, some were old and confusing (targetting the legacy API). But Pulsar documentation is good enough, the developpers are very responsive on Slack (hello @Matteo Merli :) ) , and the underlying pieces (Zookeeper, Bookkeeper) have decent documentation as well should you want to dive in the internals.
- Kafka aims for high throughput, Pulsar for low latency. Both provide settings to control it.
- Both are production-ready and battle-tested in several companies
Pro pulsar:
- from my experience the API is easier to use. In Kafka, the broker is dumb and the consumers do the job of structuring communications as they see fit. This flexibility comes at the cost of the user of Kafka having to understand how to make the pieces fit together. I guess the intended benefit is increased flexibility, but since Pulsar was able to replicate Kafka Consumers API (and with fairly little code) I give that as a pro to Pulsar.
- you can do things that are not easily done (or maybe impossible in Kafka): multi-tenancy (security, isolation...), resource management (topic throttling, quotas), geo-replication
- It has some features that Kafka currently lacks, like seeking to a particular MessageId
- Pulsar scales to millions of topics, whicle Kafka is limited by the way it structures data in Zookeeper
- Easier deployment. A standalone Pulsar will start it's own local Zookeeper, and I personally found the configuration easier to understand
- written in Java, versus a mix of legacy Scala and Java code. Also I found the codebase well organised and much easier to follow. In part because it relies on Zookeeper and Bookkeeper, which are external projects with their own documentation/community/developers etc. (please note, those are also in the Apache foundation, and also coming from Yahoo so they work well together).
Pro Kafka:
- Kafka has things built on top like Kafka Streams (never used it so I can't say if there is an equivalent)
Also read:
- https://news.ycombinator.com/item?id=12453080
- https://news.ycombinator.com/item?id=15601222
- https://streaml.io/blog/why-apache-pulsar/
- https://kafka.apache.org/uses
Apache Kafka is more mature (it's been around for longer) and has higher level APIs (i.e. KStreams). It's maturity, however restricts fluidity and flexibility i.e. ~500 open PR on github
Apache Pulsar has deeply studied the design decisions of Apache Kafka, and has incorporated an improved design and a set of exciting capabilities i.e. the idea of namespacing topics, and allowing ACL or quotas to be applied on a name-space level seems such a profounding good idea, to provide better multi-tenancy support. Some other exciting features of Pulsar is the geo-replication, as well as the unification of queuing and streaming