Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure Event Hubs limits and its comparison to pure Kafka cluster

Recently Azure released a feature called Azure Event Hubs for Kafka that allows to use Event Hubs like if it were a Kafka cluster, using the same Kafka libraries. That would allow us to migrate from our current IaaS Kafka solution to a PaaS solution, with all the advantages of a fully managed solution, and with only minimal changes in our base code (at least that's the promise).

However, while analyzing the migration we are finding it hard to get our infrastructure inside the Azure Event Hub limits. We have hundreds of topics in Kafka and we know we will scale to thousands in the future, but that can't be easily be fit inside Event hubs.

In Azure the match for the concept of topic is the Event Hub, and then you also have namespaces, that match a Kafka cluster. In fact, each namespace has a different DNS name, making it a complete different system. The limitations are the following: you can have up to 10 event hubs per namespace, up to 100 namespaces per subscription. That, translated into Kafka jargon, is up to 1000 topics. Let's suppose that's enough for our purposes, however I would need different parts of my application to connect to different Kafka clusters (namespaces) per each 10 topics I have, adding an unneeded complexity to the whole story.

It seems like in the end I am changing the difficulty of managing the infrastructure of my own cluster by the difficulty of re-architecturing my application so that it fits inside that strange 10 topic per cluster limit. With Kafka I can have 100 topics in one cluster. With Event Hubs I need 10 clusters of 10 topics each, what adds the complexity of knowing to which cluster your consumers and producers need to connect to. That completely changes the architecture of your application (making it much more complex).

I've looked through the Internet for an answer to this with no luck, everyone seems to see a lot of advantages using Event Hubs, so I am starting to think maybe I am missing something. Which would be a efficient way of fitting lots of topics inside that 10 topic limit without changing my architecture a lot?

like image 808
joanlofe Avatar asked Jun 07 '19 13:06

joanlofe


People also ask

What is the difference between Kafka and event hub?

Key differences between Apache Kafka and Event Hubs While Apache Kafka is software you typically need to install and operate, Event Hubs is a fully managed, cloud-native service. There are no servers, disks, or networks to manage and monitor and no brokers to consider or configure, ever.

Is Azure event hub compatible with Kafka?

You do not need to change your protocol clients or run your own clusters when you use the Kafka endpoint exposed by an event hub. Azure Event Hubs supports Apache Kafka version 1.0. and above.


1 Answers

Azure Event Hubs offers Kafka/EH for data streaming in two different umbrellas - Single Tenancy and Multi-tenancy. While multi-tenancy gives you the flexibility to reserve small and use small capacity, it is enforces with Quotas and Limits. These are stringent and cannot be flexed out. Reason, analogically you can imagine multi-tenancy to be a huge kafka cluster of which %CPU and %memory is shared with strict boundaries among different tenants. With this infrastructure to honor multi-tenancy we define boundaries and these boundaries are enforced by quotas and limits. Event Hubs is the only PaaS service that charges you for reserving your bandwidth and the ingress of events. There is no egress charge. We also let you ingress xMBps and egress 2xMBps and the quotas lets us with this boundary. Our single tenant clusters can be thought of as mimicking the exact KAfka cluster where there are no quotas attached. The limits here that we enforce are the actual physical limits. The limits of 1000 topics per namespace and 50 namespace per Capacity units are soft limits which can be relaxed as they are just enforcing the best practices. The cost justification when you compare Standard and Dedicated is not any different and in fact when you do > 50MBps, you can an advantage as the whole capacity is dedicated to one tenant with Dedicated. Also a single Capacity Unit (in which the Dedicated clusters are sold) lets you achieve anywhere between 100MBps - 250MBps based on your send/recieve pattern, payload size, frequency and more. For comparison purpose, although we do not do 0TUs on Standard and there is no direct relation/mapping between dedicate CUs and Standard

TU's, below is a pricing example, 50TU's = $0.03/hr x 50 = $1.5 per hour | 50,000 events per second = 180,000,000 events per hour 180,000,000 / 1,000,000 = 180 units of 1,000,000 messages | 180 X $0.028 = $5.04 | So, a grand total of $6.54 per hour

Note that the above does not include Capture pricing. And for a grand total of $6.85 per hour you get Dedicated with Capture included.

like image 59
shubha Avatar answered Sep 24 '22 05:09

shubha