understanding check pointing in eventhub

Tags:

I want to ensure that, if my eventhub client crashes (currently a console application), it only picks up events it has not yet taken from the eventhub. One way to achieve this, is to exploit offsets. However, this (to my understanding) requires the client to store the latest offset (besides events do not necessarily seem to hit the foreach loop of the ProcessEventsAsync method ordered by SequenceNumber).

An alternative, is to use checkpoints. I think they are persisted via the server (eventhub) using the provided storage account credentials. Is this correct?

This is some preliminary code I am currently using:

public class SimpleEventProcessor : IEventProcessor
{
    private Stopwatch _checkpointStopWatch;

    async Task IEventProcessor.CloseAsync(PartitionContext context, CloseReason reason)
    {
        Console.WriteLine("Processor Shutting Down. Partition '{0}', Reason: '{1}'.", context.Lease.PartitionId, reason);
        if (reason == CloseReason.Shutdown)
        {
            await context.CheckpointAsync();
        }
    }

    Task IEventProcessor.OpenAsync(PartitionContext context)
    {
        Console.WriteLine("SimpleEventProcessor initialized.  Partition: '{0}', Offset: '{1}'", context.Lease.PartitionId, context.Lease.Offset);
        _checkpointStopWatch = new Stopwatch();
        _checkpointStopWatch.Start();
        return Task.FromResult<object>(null);
    }

    async Task IEventProcessor.ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> messages)
    {
        foreach (var eventData in messages)
        {
            // do something                    
        }

        //Call checkpoint every 5 minutes, so that worker can resume processing from 5 minutes back if it restarts.
        if (_checkpointStopWatch.Elapsed > TimeSpan.FromMinutes(5))
        {
            await context.CheckpointAsync();
            _checkpointStopWatch.Restart();
        }
    }
}

I believe it sends creates a checkpoint to the server every 5 minutes. How does the server know, which client has submitted the checkpoint (via the context)? Also, how can I prevent events from processed again if the client restarts? Furthermore, there could still be an up to 5 minutes window in which events are processed again. Perhaps I should rather use a queue/topic given my requirement?

PS:

This seems to be sufficient:

async Task IEventProcessor.ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> messages)
{
    foreach (var eventData in messages)
    {
        // do something
    }
    await context.CheckpointAsync();
}

295

asked Feb 17 '16 17:02

cs0815

1 Answers

Lemme put forward a few basic terminology before answering:

EventHubs is high-thruput durable event ingestion pipeline. Simply put - its a reliable stream of events on Cloud.

Offset on EventData (one Event in the stream) is literally a Cursor on the Stream. Having this Cursor - will enable operations like - restart reading from this cursor (aka Offset) - inclusive or exclusive.

EventProcessor library is a framework that EventHubs team built, on-Top-of ServiceBus SDK to make "eventhub receiver gu" - look easier. ZooKeeper for Kafka <-> EPH for Event Hub. It will make sure when the process running EventProcessor on a specific partition dies/crashes - it will be resumed from last Checkpointed offset - in other available EventProcessorHost instance.

CheckPoint : as of today - EventHubs only supports client-side check-pointing. When you call Checkpoint from your Client-code:

await context.CheckpointAsync();

- it will translate to a Storage call (directly from Client) - which will store the current offset in the storage account you provided. EventHubs Service will not talk to Storage for Check-pointing.

THE ANSWER

EventProcessor framework is meant to achieve exactly what you are looking for.

Checkpoints are not persisted via Server (aka EVENTHUBS Service). Its purely client-side. You are talking to Azure storage. That's the reason EventProcessor library brings in a new additional dependency - AzureStorageClient. You can connect to the storage account & the container to which the checkpoints are written to - we maintain the ownership information - EPH instances (name) to Partitions of EventHubs they own and at what checkpoint they currently read/processed until.

As per the timer based checkpoint'ing pattern - you originally had - if the Process goes down - you will re-do the events in last 5 minute window. This is a healthy pattern as:

fundamental assumption is that Faults are rare events - so you will deal with duplicate events rarely
you will end-up make less calls to Storage service (which you could easily overwhelm by check-pointing frequently). I would go one step further and actually, would fire checkpoint call asynchronously. OnProcessEvents need not fail if checkpoint fails!

if you want absolutely no-events to repeat - you will need to build this de-duplication logic in the down-stream pipeline.

every time the EventProcessorImpl starts - query your downstream for the last sequence no. it got and keep discarding events until the current sequence no.

here's more general reading on Event Hubs...

140

answered Oct 13 '22 01:10

Sreeram Garlapati

Related questions
                            
                                Should C# add partial constructors?
                            
                                How do I "hide" controls that my control uses from the toolbox?
                            
                                Can web methods be overloaded?
                            
                                C# - Alternative to System.Timers.Timer, to call a function at a specific time
                            
                                How to override an inherited class property in C#?
                            
                                Single quote handling in a SQL string
                            
                                Why do my forms look like 'Windows Classic'?
                            
                                Load control style from separate file in wpf
                            
                                C# Open XML 2.0 NumberFormatId range
                            
                                'Class' does not contain a definition for 'Method'
                            
                                AggregateException - What does AggregateException.Flatten().InnerException represent?
                            
                                How to set optional parameter without compile-time constant
                            
                                Deleting an item with entity framework
                            
                                Using QueueClient.OnMessage in an azure worker role
                            
                                The type initializer for 'Emgu.CV.CvInvoke' threw an exception
                            
                                MahApps MessageBoxes using MVVM
                            
                                How to secure a controller on WebAPI for use by only the local machine
                            
                                Configuration section 'log4net' is missing a section declaration
                            
                                Can't load System.Web.Cors assembly after call to Microsoft.Owin.Cors
                            
                                Sending a value from server to client with sockets

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

understanding check pointing in eventhub

Tags:

c#

azureservicebus

azure-eventhub

cs0815

People also ask

1 Answers

Sreeram Garlapati

Recent Activity

Donate For Us