Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should the event hub have same number of partitions as throughput units?

For Azure event hub 1 though put unit equals 1MB/sec ingress. So it can take 1000 messages of 1 KB. If I select 5 or more throughput units would I be able to ingest 5000 messages/ second of 1KB size with 4 partitions? What would be egress in that case? I am not sure about limitation on Event Hub partition, i read that it is also 1MB/sec. But then does that mean to use event hub effectively i need to have same number of partitions?

like image 583
azuredeveloper Avatar asked Feb 20 '16 06:02

azuredeveloper


People also ask

What is throughput units in event hub?

The throughput capacity of Event Hubs is controlled by throughput units. Throughput units are pre-purchased units of capacity. A single throughput unit lets you: Ingress: Up to 1 MB per second or 1000 events per second (whichever comes first). Egress: Up to 2 MB per second or 4096 events per second.

What is partition count in event hub?

The number of partitions specified at the event hub creation time is static and it cannot be modified later (so care should be taken). Using the standard Azure portal you can create between 2 to 32 partitions (default 4), however, if required, you can create up to 1024 partitions by contacting Azure support.

How is data distribution performed among event hub partitions if no partition key is specified by the sender?

It is processed through a static hashing function, which creates the partition assignment. If you don't specify a partition key when publishing an event, a round-robin assignment is used.


2 Answers

Great question.

A few basics.

1 Throughput Unit (TU) means an ingress limit of 1 MB/sec or 1000 msgs/sec - whichever happens first. You pay for TUs and you can change TUs as per your load requirements. This is your knob to control the bill. And TUs are set on a given Event Hubs Namespace!

When you buy 1 TU for an EventHubs Namespace and create a number of EventHubs in it, the the limit of 1 MB/sec or 1000 msgs/sec applies cumulatively across them. The limit also applies to each partition individually. Although, sometimes you might get lucky in some regions where load is low.

Consider these principles while deciding on no. of partitions in eventhub for your service:

  1. The intent of Partitions is to offer high-availability. If you are sending to Eventhubs and you want the sends to succeed NO MATTER WHAT you should create multiple partitions and send using EventHubClient.Send (which doesn't confine the send to a particular partition).
  2. The no. of partitions will determine how fat the event pipe is & how fast/parallelly you can receive & process the events. If you have 10 partitions on your EventHub - it's capacity is effectively capped at 10 TUs. You can create 10 epoch receivers in parallel & consume & process events. If you envision that the EventHub that you are currently creating now can quickly grow 10-fold create as many partitions and keep the TU's matching the current load. Analogy here is having multiple lanes on a freeway!

Another thing to note is, a TU is configured at namespace level. And, one Event Hubs namespace can have multiple EventHubs in it and each EventHub can have a different no. of partitions.

Answers:

If you select 5 or more TUs on the Namespace and have only 1 EventHub with 4 partitions you will get a max. of 4 MB/sec or 4K msgs/sec.

Egress max will be 2X of ingress (8 MBPS or 8K msgs/sec). In other words, you could create 2 patterns of receives (e.g. slow and fast) by creating 2 consumer groups. If you need more than 2X parallel receives then you will need to by more TUs.

Yes, ideally you will need more partitions than TUs. First model your partition count as mentioned above. Start with 1 TU while you are developing your solution. Once done, when you are doing load testing or going live, increase TUs in tune with your load. Remember, you could have multiple EventHubs in a Namespace. So, having 20 TUs at Namespace level and 10 EventHubs with 4 partitions each can deliver 20 MB/sec across the Namespace.

More on EventHubs

like image 149
Sreeram Garlapati Avatar answered Oct 13 '22 01:10

Sreeram Garlapati


One partition goes to one TPU. Think of TPUs as a processing engine. You can't take advantage of more TPUs than you have partitions. If you have 4 partitions, you can't use more than 4 TPUs.

It's typical to have more partitions than TPUs, for the following reasons

  • You can scale the number of TPUs up if you have a lot of traffic, but you can't change the number of partitions
  • You can't have more concurrent readers than you have partitions. If you want to have 5 concurrent readers, you need 5 partitions.

As for throughput, the limits are 1 MB ingerss/2 MB egress per TPU. This covers the typical scenario where each event is sent both to cold storage (eg a database) and Stream analytics or an event processor for analysis, monitoring etc.

like image 25
Panagiotis Kanavos Avatar answered Oct 13 '22 01:10

Panagiotis Kanavos