Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cancelling specific items in a dataflow pipeline

I am building a Dataflows pipeline whose job it is to process large files. Each file is parsed, analyzed, and rendered; but every file may take a different path through the pipeline, depending on what type of file it is.

The user interface for this pipeline consists of a list of files to be processed, along with a progress bar and a "Cancel" button next to each file (and, of course, a button to add a new file to the queue). When the user clicks the "Cancel" button next to a specific file, I'd like to remove just that one file from the pipeline.

I must be missing something though, because I can't figure out how to do that. I know I can cancel an entire block, but I don't want to do that, I just want to cancel a single item in the pipeline. So, what am I missing ?

like image 868
Bugmaster Avatar asked Jul 10 '15 18:07

Bugmaster


People also ask

How do I restart a Dataflow job?

Cloud Dataflow currently does not provide a mechanism to restart a Dataflow job that has been stopped or cancelled.

What is staging location in Dataflow?

staging_location : a Cloud Storage path for Dataflow to stage temporary job files created during the execution of the pipeline.


1 Answers

TPL Dataflow doesn't support cancelling specific items out of the box.

You can implement that yourself by creating a wrapper over the item with a matching CancellationToken and posting it to the pipeline instead of just the file. Then just add the code in each block that disregards that file if the token was cancelled and the item will quickly pass through:

var block = new ActionBlock<FileWrapper>(wrapper => 
{
    if (wrapper.CancellationToken.IsCancelltionRequested)
    {
        return;
    }

    ProcessFile(wrapper.File);
});

This means that you have one token per item which allows you to target individual items.

like image 116
i3arnon Avatar answered Sep 28 '22 09:09

i3arnon