Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use client-side event batching functionality while Sending to Microsoft Azure EventHubs

I'm dealing with a high throughput application of EventHub. According to the documentation, in order to achieve very high throughput from a single sender, then client-side batching is required (without exceeding the 256 KB limit per event).

Best Practices for performance improvements using Service Bus brokered messaging suggests Client-side batching for achieving performance improvements. It describes client-side batching is available for queue or topic clients, which enables delaying the sending of messages for a certain period of time, then it transmits the messages in a single batch.

Is client-side batching available in the EventHub client?

like image 632
Attila Cseh Avatar asked Jan 07 '23 05:01

Attila Cseh


1 Answers

ShortAns: EventHubs is designed to support very-high thruput scenarios - Client-side batching is one of the Key features to enable this. API is `EventHubClient.SendBatch(IEnumerable).

Long Story:

The link that you found: Best Practices for performance improvements using Service Bus brokered messaging applies to ServiceBus Queues & Topics - which uses a Microsoft Proprietary protocol called - SBMP - and is not an Open Standard. We implemented BatchFlushInterval in that Protocol. This was a while back (guess around 2010) - where Amqp protocol wasn't standardized yet. When we started building Azure EventHubs service - Amqp is the new Standard protocol for implementing performant messaging solutions and hence, we used Amqp as our first-class protocol for Event Hubs. BatchFlushInterval doesn't have any effect in EventHubs (Amqp).

EventHubClient translates every raw event that you need to send to EventHub into AmqpMessage (refer to Messaging section in the (Amqp Protocol Specification).

In order to do that, as per the protocol, it adds few extra bytes to each Message. The estimated Size of each Serialized EventData (to AmqpMessage) can be found using the property - EventData SerializedSizeInBytes.

With that background, coming to your scenario: Best way, to achieve very high-thruputs - is to use EventHubClient.SendBatch(IEnumerable<EventData>) api. The contract of this Api is - before invoking SendBatch - the caller need to make sure the Serialized Size of this Batch of messages doesn't exceed 256k. Internally, this API converts the IEnumerable<EventData> into 1 Single AmqpMessage and sends to EventHub Service. The limit on 1 single AmqpMessage imposed by EventHubs service as-of 4-25-2016 is 256k. Plus, one more detail - when the list of EventData are translated to a Single AmqpMessage - EventHubClient needs to promote some information into the BatchMessage header - which is common for all of those messages in the batch(info like partitionKey). This info. is guaranteed to be a max of 6k.

So, all-in-all, the caller need to keep track of the aggregate size of all EventData in the IEnumerable<EventData> and make sure that this falls below 250k.


EDIT ON 09/14/2017

WE added EventHubClient.CreateBatch API to support this scenario.

There is no more guess work involved in constructing a Batch of EventDatas. Get an Empty EventDataBatch from EventHubClient.CreateBatch API and then use TryAdd(EventData) api to add events to construct the Batch.

And, finally use EventDataBatch.ToEnumerable() to get the underlying events to pass to the EventHubClient.Send() API.

more on Event Hubs...

like image 194
Sreeram Garlapati Avatar answered Jan 17 '23 15:01

Sreeram Garlapati