I am trying to implement an Event Hub in Azure. I have managed to create a Producer which publishes messages to the Event Hub, as well as a Consumer which reads them off. My Event Hub is divided up into 16 partitions. On the consumer side, I loop through each of these as follows: <pre class="prettyprint"><code>var eventHub = NamespaceManager.CreateFromConnectionString(builder.ToString()).GetEventHub("de-analytics-events"); foreach (var partitionId in eventHub.PartitionIds) { subscriberGroup.RegisterProcessor<EventProcessor>(new Lease { PartitionId = partitionId }, new EventProcessorCheckpointManager()); Console.WriteLine("Processing: " + partitionId); } </code></pre> Looking at these values in a debugger shows that the <code>eventHub.PartitionIds</code> range from "0" to "15" in the case of 16 partitions. However, on the producer side, all I was allowed to specify was my <code>EventData.PartitionKey</code>, which is a string, but which does not directly correspond to the strings on the consumer side. E.g. if I specified a PartitionKey = "7", it did not necessarily write to partition "7". Reading up shows that some sort of hashing is involved, but I don't particularly want to guess randomly at 16 strings that hash to the numbers 0-15. So I'm wondering how I can define which partition is published to? For added reference, this is the tutorial I followed to get my simplest case working.

Specifying PartitionKey will ensure that all the events that have the same key are sent to the same partition and there is order maintained for these events within the partition. Do you have such requirement for your data on the processing side? If you don't have such requirement then the recommendation is to "not set the PartitionKey". That ways Event hub broker will distribute the events amongst the partitions uniformly. If you do have the order guarantee requirements for your data within a PartitionKey and you have a small number of publishers then there is manual way of handling the partitions and distributing load using the Partitioned Sender. Refer to this link on how to use the Partitioned Sender. http://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.eventhubclient.createpartitionedsender.aspx

How does Azure's EventData.PartitionKey decide which partition to write to?

Tags:

c#

azure

I am trying to implement an Event Hub in Azure. I have managed to create a Producer which publishes messages to the Event Hub, as well as a Consumer which reads them off. My Event Hub is divided up into 16 partitions. On the consumer side, I loop through each of these as follows:

var eventHub = NamespaceManager.CreateFromConnectionString(builder.ToString()).GetEventHub("de-analytics-events");

foreach (var partitionId in eventHub.PartitionIds)
{
     subscriberGroup.RegisterProcessor<EventProcessor>(new Lease
     {
         PartitionId = partitionId
     }, new EventProcessorCheckpointManager());

     Console.WriteLine("Processing: " + partitionId);
}

Looking at these values in a debugger shows that the eventHub.PartitionIds range from "0" to "15" in the case of 16 partitions.

However, on the producer side, all I was allowed to specify was my EventData.PartitionKey, which is a string, but which does not directly correspond to the strings on the consumer side. E.g. if I specified a PartitionKey = "7", it did not necessarily write to partition "7".

Reading up shows that some sort of hashing is involved, but I don't particularly want to guess randomly at 16 strings that hash to the numbers 0-15. So I'm wondering how I can define which partition is published to?

For added reference, this is the tutorial I followed to get my simplest case working.

775

asked Sep 16 '14 19:09

mike

2 Answers

Specifying PartitionKey will ensure that all the events that have the same key are sent to the same partition and there is order maintained for these events within the partition.

Do you have such requirement for your data on the processing side?

If you don't have such requirement then the recommendation is to "not set the PartitionKey". That ways Event hub broker will distribute the events amongst the partitions uniformly.

If you do have the order guarantee requirements for your data within a PartitionKey and you have a small number of publishers then there is manual way of handling the partitions and distributing load using the Partitioned Sender.
Refer to this link on how to use the Partitioned Sender. http://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.eventhubclient.createpartitionedsender.aspx

answered Sep 27 '22 18:09

Padma Aradhyula

You're correct, a hash is used to translate the partition key to a given partition. The question I have then, is as long as the hash algorithm distributes events evenly and consistently, why should you really care which partition the message is assigned to?

Yes, you could argue that you want to know so you know who the receiver will be. But the reality is that tight coupling like this makes the solution inherently fragile. You're betting off letting the service do what it needs to do to keep traffic healthy and realize that once you get a message using a given partition key, you're very likely to always get messages using that key.

The bigger challenge is to ensure that the partition key strategy you use is one that will help ensure a fairly even distribution of events across the partitions (aka don't give 10,000 devices all the same partition key).

answered Sep 27 '22 19:09

BrentDaCodeMonkey

Related questions
                            
                                using Alias = Class; with generics [duplicate]
                            
                                Algorithm for finding the segment overlapping two collinear segments
                            
                                How should I decode a UTF-8 string
                            
                                Generic Type Conversions
                            
                                How to get only the provider connection string from Web.Config in EF setup?
                            
                                Split a list into sublist by checking a condition on elements
                            
                                Find duplicate in Array with single loop
                            
                                Xamarin Android app crash in Release mode (Parse.Android SDK)
                            
                                Serilog machine name enricher for rolling file sink
                            
                                Cannot access properties after JSON deserialization into dynamic
                            
                                When retrieving an appointment with EWS the subject contains the organizer name
                            
                                How to Use OrderBy in linq for DataTable in asp.net
                            
                                How can I set selected in a ComboBox, based on ValueMember?
                            
                                Extension methods resolution IQueryable vs IEnumerable
                            
                                Code-First approach, location of database and all instances (v110, etc)
                            
                                Type is not supported for deserialization of an array
                            
                                Add-Migration while there are explicit migrations pending
                            
                                Unable to update cookies in asp.net mvc
                            
                                How to show a variable value in ASP.NET Web Forms?
                            
                                Cannot convert type 'T' to bool

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With