Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Route events to eventhub EventProcessor

I have events of different types. For example, some data is telemetry data, some is error information etc.

I thought it would be a good idea to create several IEventProcessor implementations, one for each event type. So each implementation will handle the event differently. Like writing to file or to database.

What's the best way to route events to a specific EventProcessor?

  • Should I let an EventProcessor monitor a specific partitionkey and if so, how?
  • Should I use the constructor of the EventProcessorHost that lets me specify a consumergroupname? If so, how can I send to a specific consumergroup using the EventHubClient? I do not see an option to specify a consumergroup there.
  • Should I do none of the above and just check an incoming eventdata for a specific property and just ignore the ones that I am not interested in?

I must say that I find the relation between partitionkey and consumergroup (if there is any) badly documented.

I've used option 2 but so far each EventProcessor get messages from all the consumergroupnames, not just the one specified in the EventProcessorHost constructor.

like image 974
Peter Bons Avatar asked Nov 30 '15 17:11

Peter Bons


1 Answers

Great Question!

Before answering - I wanted to re-iterate couple of principles we followed while building EventHubs.

  • We wanted Event Hubs to be a highly durable, high-throughput, event ingestion pipeline. The major differentiating factor for coming up with a new Service while we already had existing pub-sub services on Azure like Queues/Topics (similar to AWS SQS, Google Pub-sub) - is, to provide higher throughput variant (& of course, with low latency) . We were able to deliver on this goal - with the trade-off that - we don't perform any per-message computations - like executing a Filter etc. on the Service. When you need per-message semantics - like de-dup per message, acknowledge receive per message, in your case, filter based on a property per message - and the throughput requirements are low - Queue/Topic might be your best bet.

  • We also envisioned that, Senders (or publishers) are at a much higher scale and vary significantly based on scenario. So we introduced 3 Sending patterns (Send, Send with PartitionKey, Send directly to a Partition). So, while sending you will notice the notion of PartitionKey - which will in turn translate to a Particular partition (Consider PartitionKey as a Clue to EventHub Service to Calculate placement of all events with the Same PartitionKey to be on Same Partition). But, while consuming Events, there is no notion of PartitionKey directly exposed by EventHubs. There is no relation b/w ConsumerGroups and PartitionKey.

  • and Receivers are usually just the computation roles and are limited in number. So, we exposed 1 generic Receive (consume) pattern - Receive from a Partition. Now, while consuming events, there might be different types of Consumers based on different factors - for ex: the Speed of consumption (Real-time Vs Historical), or type of data - and hence - we exposed multiple consumer groups. Although you could create 20 CGs, one interesting limitation we have here is that - each thruput unit purchased can yield 1 MBPS in and 2 MBPS out - which if fully utilized on Send side will limit it to 2 CGs. So, If you are processing the exact Same stream and have different ways to handle each event but each of them takes equal amount of time to process - then, using the same ConsumerGroup makes more sense.

To answer your question: IT REALLY DEPENDS.

Here are few solutions:

  • Since, there is a mix of event types in your scenario - you will need to foresee/decide if you have any scenarios, where there is a need to read and Process all types of events by a single consumer/processor. One ex: we usually see is - using one ConsumerGroup you want a count of all errors and other consumer group would actually perform specific action per error Type. If, you don't need that - sending each EventType to different eventhubs and then, using 1 consumer group with the specific IEventProcessor - is an option.

  • If you have scenarios where there is a need to Send all events to the same EventHub, and if you know that processing speed of some of the eventTypes is(or need to be) very fast - you should consider using different consumergroup, with Each consumer group tied to a specific IEventProcessor implementation and it will ignore the other EventTypes. For ex: if the ErrorInfo events and Special events need attention at Real-time and if the telemetry data is okay to take a hit of 15 mins due to slow processing or high-peak load times - I would go for one ConsumerGroup and name it Real-time and tie it with IEventProcessor which handles 2 types - Error and Special. Create 2nd ConsumerGroup and tie it with an IEventProcessor which handles Telemetry events.

like image 93
Sreeram Garlapati Avatar answered Sep 27 '22 18:09

Sreeram Garlapati