Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can we have strong routing capability with Apache Kafka similar to RabbitMq?

Tags:

We are trying to evaluate Kafka and replace Rabbit Mq in our software.

We know the advantages of Kafka in terms of RabbitMq over Offline consumption, huge persistence , superb performance , low latency and high throughput.

But we need the capability the way RabbitMq has with topic exchange granular routing for heterogeneous consumption.

To some extent we can achieve this by having more number of partition per broker in Kafka. But it has it's own limitations such as overhead of topic metadata on znode , increase latency.

Our use case is to filter data within partition. Assume you are getting 100 sensor data of similar type in one partition. Can consumer have capability to select only few of the sensor data and ignore the rest.

We can do the filtering/routing at the application(consumer) side but it's seems to be not reusable and additional overhead at each consumer side.

Is there any way Kafka can provide rich routing capability by having optimum number of partition?

Thanks, Ashish

like image 385
Ashish Avatar asked Mar 26 '15 06:03

Ashish


People also ask

Which is better Apache Kafka or RabbitMQ?

Apache Kafka:Kafka offers much higher performance than message brokers like RabbitMQ. It uses sequential disk I/O to boost performance, making it a suitable option for implementing queues. It can achieve high throughput (millions of messages per second) with limited resources, a necessity for big data use cases.

What is the difference between RabbitMQ and Apache Kafka?

While RabbitMQ uses exchanges to route messages to queues, Kafka uses more of a pub/sub approach. A producer sends its messages to a specific topic. A single consumer or multiple consumers—a “consumer group”—can consume those messages.

Why is Kafka faster than RabbitMQ?

Apache Kafka employs sequential disk I/O for enhanced performance for implementing queues compared to message brokers in RabbitMQ. RabbitMQ queues are faster only when they're empty, unlike Kafka that can retain lots of data with minimal overhead. Kafka is capable of processing millions of messages in a second.


1 Answers

Kafka's messaging model is a lot simpler model than RabbitMQ, and users are wise to use the few abstractions that it does provide as they were intended. Really, topics are the only level of routing that should ever be done in Kafka. Partitions serve only to scale, provide order (but only within the partition, which Is a notable issue for scalability if you have an order-dependent application), and facilitate concurrent consumers within a topic.

The problem with doing routing at the level of partitions is that it's not scalable because partitions are the element of Kafka that provides scalability (at the messaging layer at least). Obviously, Kafka is not designed for granular routing. It's designed for persistent, reliable, scalable, pub/sub messaging. Nor are partitions designed to scale across the cluster. By their very nature, partitions are local to one or a few Kafka nodes (depending on the topic's replication factor), but Kafka spreads multiple partitions within a topic across the cluster. This means there is some potential for hot spotting if messages are favoring some particular partition instead of being evenly distributed across partitions in a topic (which is why the Kafka producer normally handles partitioning for you).

In terms of filtering on the client side, I think you're right: that feels like a lot of wasted resources to me, but maybe I just dislike wasted resources too much.

In short, I think you may risk digging yourself into a hole if you try to think of Kafka's messaging abstractions in such complex terms. Kafka is very much designed for and optimized to distribute load via partitions, so co-opting them for a different - even if vaguely similar - use case is certainly not ideal.

I have a feeling you can manage your use case within the context of Kafka's features. I find that the biggest challenge with complex routing schemes within Kafka's topic framework is preventing duplicate data within multiple topics, but once you understand how multiple applications can consume from different positions within the same topic that issue seems to disappear. In this sense, it's important to think of Kafka more as a log than as a queue.

On a side note, I think your concern with znodes required to manage partitions is unfounded. If you have enough topics and partitions to consume the memory of your ZooKeeper nodes (a ton) then you've likely already run into much bigger resource issues.

like image 77
kuujo Avatar answered Oct 20 '22 06:10

kuujo