I am building a Dataflows pipeline whose job it is to process large files. Each file is parsed, analyzed, and rendered; but every file may take a different path through the pipeline, depending on what type of file it is.
The user interface for this pipeline consists of a list of files to be processed, along with a progress bar and a "Cancel" button next to each file (and, of course, a button to add a new file to the queue). When the user clicks the "Cancel" button next to a specific file, I'd like to remove just that one file from the pipeline.
I must be missing something though, because I can't figure out how to do that. I know I can cancel an entire block, but I don't want to do that, I just want to cancel a single item in the pipeline. So, what am I missing ?
Cloud Dataflow currently does not provide a mechanism to restart a Dataflow job that has been stopped or cancelled.
staging_location : a Cloud Storage path for Dataflow to stage temporary job files created during the execution of the pipeline.
TPL Dataflow doesn't support cancelling specific items out of the box.
You can implement that yourself by creating a wrapper over the item with a matching CancellationToken
and posting it to the pipeline instead of just the file. Then just add the code in each block that disregards that file if the token was cancelled and the item will quickly pass through:
var block = new ActionBlock<FileWrapper>(wrapper =>
{
if (wrapper.CancellationToken.IsCancelltionRequested)
{
return;
}
ProcessFile(wrapper.File);
});
This means that you have one token per item which allows you to target individual items.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With