Can we have strong routing capability with Apache Kafka similar to RabbitMq?

Tags:

We are trying to evaluate Kafka and replace Rabbit Mq in our software.

We know the advantages of Kafka in terms of RabbitMq over Offline consumption, huge persistence , superb performance , low latency and high throughput.

But we need the capability the way RabbitMq has with topic exchange granular routing for heterogeneous consumption.

To some extent we can achieve this by having more number of partition per broker in Kafka. But it has it's own limitations such as overhead of topic metadata on znode , increase latency.

Our use case is to filter data within partition. Assume you are getting 100 sensor data of similar type in one partition. Can consumer have capability to select only few of the sensor data and ignore the rest.

We can do the filtering/routing at the application(consumer) side but it's seems to be not reusable and additional overhead at each consumer side.

Is there any way Kafka can provide rich routing capability by having optimum number of partition?

Thanks, Ashish

385

asked Mar 26 '15 06:03

Ashish

1 Answers

Kafka's messaging model is a lot simpler model than RabbitMQ, and users are wise to use the few abstractions that it does provide as they were intended. Really, topics are the only level of routing that should ever be done in Kafka. Partitions serve only to scale, provide order (but only within the partition, which Is a notable issue for scalability if you have an order-dependent application), and facilitate concurrent consumers within a topic.

The problem with doing routing at the level of partitions is that it's not scalable because partitions are the element of Kafka that provides scalability (at the messaging layer at least). Obviously, Kafka is not designed for granular routing. It's designed for persistent, reliable, scalable, pub/sub messaging. Nor are partitions designed to scale across the cluster. By their very nature, partitions are local to one or a few Kafka nodes (depending on the topic's replication factor), but Kafka spreads multiple partitions within a topic across the cluster. This means there is some potential for hot spotting if messages are favoring some particular partition instead of being evenly distributed across partitions in a topic (which is why the Kafka producer normally handles partitioning for you).

In terms of filtering on the client side, I think you're right: that feels like a lot of wasted resources to me, but maybe I just dislike wasted resources too much.

In short, I think you may risk digging yourself into a hole if you try to think of Kafka's messaging abstractions in such complex terms. Kafka is very much designed for and optimized to distribute load via partitions, so co-opting them for a different - even if vaguely similar - use case is certainly not ideal.

I have a feeling you can manage your use case within the context of Kafka's features. I find that the biggest challenge with complex routing schemes within Kafka's topic framework is preventing duplicate data within multiple topics, but once you understand how multiple applications can consume from different positions within the same topic that issue seems to disappear. In this sense, it's important to think of Kafka more as a log than as a queue.

On a side note, I think your concern with znodes required to manage partitions is unfounded. If you have enough topics and partitions to consume the memory of your ZooKeeper nodes (a ton) then you've likely already run into much bigger resource issues.

answered Oct 20 '22 06:10

kuujo

Related questions
                            
                                How do I use string as a key to PostgreSQL advisory lock?
                            
                                React Native: How to export a method with a return value?
                            
                                Dynamic odata service in C# from runtime data layer
                            
                                Rails application deployed on Elastic Beanstalk with Puma fails - 502 errors on every request
                            
                                Create svg arcs between two points
                            
                                Cast java.util.function.Function to Interface
                            
                                Why would a fully CPU bound process work better with hyperthreading?
                            
                                Where does the owner *name* for an S3 bucket/AWS account come from?
                            
                                How do I provide files for download?
                            
                                Why use public methods in JavaScript objects? [duplicate]
                            
                                AMP browser support?
                            
                                How to know if I am using Open JDK or Oracle JDK?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With