I'm using a service which outputs to an Event Hub.
We want to store that output, to be read once per day by a batch job running on Apache Spark. Basically we figured, just get all messages dumped to blobs.
What's the easiest way to capture messages from an Event Hub to Blob Storage?
Our first thought was a Streaming Analytics job, but it demands to parse the raw message (CSV/JSON/Avro), our current format is none of those.
Update We solved this problem by changing our message format. I'd still like to know if there's any low-impact way to store messages to blobs. Did EventHub have a solution for this before Streaming Analytics arrived?
You could write your own worker process to read the messages off EventHub and store them to blob storage. You do not need to do this real time as messages on EH remain for the set retention days. The client that reads the EH is responsible for managing what messages have been processed by keeping track of the EH message partitionid and offset. There is a C# library that makes this extremely easy and scales really well: https://azure.microsoft.com/en-us/documentation/articles/event-hubs-csharp-ephcs-getstarted/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With