Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I set EventData.PartitionKey while sending to EventHubs using a PartitionSender?

I currently have an EventHub instance set up in Azure. It has 5 partitions. What I want to know if if the PartitionKey always has to be a number between 0 and n-1 with n being the number of partitions.

I have the following code:

    private static async Task SendMessagesToEventHub(int numMessagesToSend)
    {
        var sender = eventHubClient.CreatePartitionSender("test1");

        for (var i = 0; i < numMessagesToSend; i++)
        {
            try
            {
                var message = $"Message {i}";
                Console.WriteLine($"Sending message: {message}");
               await  sender.SendAsync(new EventData(Encoding.UTF8.GetBytes(message)));
            }
            catch (Exception exception)
            {
                Console.WriteLine($"{DateTime.Now} > Exception: {exception.Message}");
            }

            await Task.Delay(10);
        }

        Console.WriteLine($"{numMessagesToSend} messages sent.");
    }

This the throws an exception

The specified partition is invalid for an EventHub partition sender or receiver. It should be between 0 and 4.

In the documentation of EventHub, this is what they say regarding the PartitionKey:

The EventData class has a PartitionKey property that enables the sender to specify a value that is hashed to produce a partition assignment. Using a partition key ensures that all the events with the same key are sent to the same partition in the Event Hub. Common partition keys include user session IDs and unique sender IDs.

To me this means that you are not limited to an int but can use any string. What am I missing?

like image 251
David Pilkington Avatar asked Nov 05 '25 23:11

David Pilkington


1 Answers

Answer:

You cannot mix PartitionKey and PartitionSender - they are 2 mutually exclusive concepts.

Don't use a PartitionSender aka ehClient.CreatePartitionSender() - API, which was designed to send to a specific partition (in which case EventHub service cannot use the PartitionKey to-hash-to anymore).

Instead, use this code snippet in c#:

EventData myEvent = new EventData(Encoding.UTF8.GetBytes(message));
myEvent.PartitionKey = "test1";
await eventHubClient.SendAsync(myEvent);

We learned that this was a bit confusing API to grasp for our customers and then when we did our Java SDK, we corrected/simplified our API to look like this:

EventData myEvent = new EventData(message.getBytes(Charset.defaultCharset()))
eventHubClient.SendSync(myEvent, "test1");

The 3 types of Send Patterns Exposed by Event Hubs:

When we developed EventHubs service - we wanted to give multiple levels of control on Partitioning their event stream - to our users. We came up with the below 3 modes (our c# client API's):

  1. EventHubClient.Send(eventData_Without_PartitionKey) - use this when you don't want any control on how data is partitioned. EventHubs service will try to distribute data uniformly across all partitions (best-effort, no guarantees). As, you traded off having control on partitioning your data - what you gain here is high-availability. If you have an Event Hub with 32 partitions - and are using this method of sending to Event Hubs - your event will be delivered to one of the 32 Event Hubs partitions that is immediately available & have least data on it.

  2. EventHubClient.Send(eventData_With_PartitionKey) - use this when you have a property on your data - using which you want to partition your data. EventHubs service will make sure all EventDatas with same PartitionKey will land on the same EventHubs partition. Here - user controls partitioning by specifying a hint - using which our service will run a hash algorithm and deliver to the hashed partition. All events with the same PartitionKey are guaranteed to land on the same Event Hubs partition.

  3. EventHubSender.Send(eventData_Without_PartitionKey) - EventHubPartitionSender name would have been more apt for this - use this when you want complete control on Partitioning your data - when you need control on - which EventData should land on which EventHubs partition. This is typically used - when customers have their own proprietary hash algorithm - which they believe to perform better, for their scenarios - w.r.to. fairness of load distribution across all EventHubs partitions.

What you need is (2).

here's some general reading on Event Hubs concepts...

like image 75
Sreeram Garlapati Avatar answered Nov 07 '25 13:11

Sreeram Garlapati