Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Azure's EventData.PartitionKey decide which partition to write to?

Tags:

c#

azure

I am trying to implement an Event Hub in Azure. I have managed to create a Producer which publishes messages to the Event Hub, as well as a Consumer which reads them off. My Event Hub is divided up into 16 partitions. On the consumer side, I loop through each of these as follows:

var eventHub = NamespaceManager.CreateFromConnectionString(builder.ToString()).GetEventHub("de-analytics-events");

foreach (var partitionId in eventHub.PartitionIds)
{
     subscriberGroup.RegisterProcessor<EventProcessor>(new Lease
     {
         PartitionId = partitionId
     }, new EventProcessorCheckpointManager());

     Console.WriteLine("Processing: " + partitionId);
}

Looking at these values in a debugger shows that the eventHub.PartitionIds range from "0" to "15" in the case of 16 partitions.

However, on the producer side, all I was allowed to specify was my EventData.PartitionKey, which is a string, but which does not directly correspond to the strings on the consumer side. E.g. if I specified a PartitionKey = "7", it did not necessarily write to partition "7".

Reading up shows that some sort of hashing is involved, but I don't particularly want to guess randomly at 16 strings that hash to the numbers 0-15. So I'm wondering how I can define which partition is published to?

For added reference, this is the tutorial I followed to get my simplest case working.

like image 775
mike Avatar asked Sep 16 '14 19:09

mike


People also ask

How many partitions should I have event hub?

The number of partitions is specified at the time of creating an event hub. It must be between 1 and the maximum partition count allowed for each pricing tier. For the partition count limit for each tier, see this article.

What are partitions in Azure event hub?

A partition is an ordered sequence of events that is held in an event hub. As newer events arrive, they are added to the end of this sequence. A partition can be thought of as a “commit log.” Event Hubs retains data for a configured retention time that applies across all partitions in the event hub.

How does Azure event hub work?

Azure Event Hubs is a big data streaming platform and event ingestion service. It can receive and process millions of events per second. Data sent to an event hub can be transformed and stored by using any real-time analytics provider or batching/storage adapters.


2 Answers

Specifying PartitionKey will ensure that all the events that have the same key are sent to the same partition and there is order maintained for these events within the partition.

Do you have such requirement for your data on the processing side?

If you don't have such requirement then the recommendation is to "not set the PartitionKey". That ways Event hub broker will distribute the events amongst the partitions uniformly.

If you do have the order guarantee requirements for your data within a PartitionKey and you have a small number of publishers then there is manual way of handling the partitions and distributing load using the Partitioned Sender.
Refer to this link on how to use the Partitioned Sender. http://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.eventhubclient.createpartitionedsender.aspx

like image 63
Padma Aradhyula Avatar answered Sep 27 '22 18:09

Padma Aradhyula


You're correct, a hash is used to translate the partition key to a given partition. The question I have then, is as long as the hash algorithm distributes events evenly and consistently, why should you really care which partition the message is assigned to?

Yes, you could argue that you want to know so you know who the receiver will be. But the reality is that tight coupling like this makes the solution inherently fragile. You're betting off letting the service do what it needs to do to keep traffic healthy and realize that once you get a message using a given partition key, you're very likely to always get messages using that key.

The bigger challenge is to ensure that the partition key strategy you use is one that will help ensure a fairly even distribution of events across the partitions (aka don't give 10,000 devices all the same partition key).

like image 23
BrentDaCodeMonkey Avatar answered Sep 27 '22 19:09

BrentDaCodeMonkey