Tasks vs. TPL Dataflow vs. Async/Await, which to use when?

Tags:

I have read through quite a number technical documents either by some of the Microsoft team, or other authors detailing functionality of the new TPL Dataflow library, async/await concurrency frameworks and TPL. However, I have not really come across anything that clearly delineates which to use when. I am aware that each has its own place and applicability but specifically I wonder in regards to the following situation:

I have a data flow model that runs completely in-process. At the top sits a data generation component (A) which generates data and passes it on either via data flow block linkages or through raising events to a processing component (B). Some parts within (B) have to run synchronously while (A) massively benefits from parallelism as most of the processes are I/O or CPU bound (reading binary data from disk, then deserializing and sorting them). In the end the processing component (B) passes on transformed results to (C) for further usage.

I wonder specifically when to use tasks, async/await, and TPL data flow blocks in regards to the following:

Kicking off the data generation component (A). I clearly do not want to lock the gui/dashboard thus this process would have to somewhat run on a different thread/task.
How to call methods within (A), (B), and (C) that are not directly involved in the data generation and processing process but perform configuration work that may possibly take several hundred milliseconds/seconds to return. My hunch is that this is where async/await shines?
The most I struggle with is how to best design the message passing from one component to the next. TPL Dataflow looks very interesting but it is sometimes too slow for my purpose. (Note at the end in regards to performance issues). If not using TPL Dataflow how do I achieve responsiveness and concurrency by in-process inter-task/concurrent data passing? Example, clearly if I raise an event within a task the subscribed event handler runs in the same task instead of being passed to another task, correct? In summary, how can component (A) go about its business after passing on data to component (B) while component (B) retrieves the data and focuses on processing it? Which concurrency model is best used here? I implemented data flow blocks here, but is that truly the best approach?
I guess above points in summary point to my struggle with how to design and implement API type components using standard practice? Should methods be designed async, data inputs as data flow blocks, and data output as either data flow block or event? What is the best approach in general? I am asking because most of the components mentioned above are supposed to work independently, so they can essentially be swapped out or independently altered internally without having to re-write accessors and output.

Note on performance: I mentioned TPL Dataflow blocks are sometimes slow. I deal with a high throughput, low latency type of application and target disk I/O limits and thus tpl dataflow blocks often performed much slower than, for example, a synchronous processing unit. Issue is that I do not know how to embed the process in its own task or concurrent model to achieve something similar than what tpl dataflow blocks already take care of, but without the overhead that comes with tpl df.

463

asked Nov 27 '12 06:11

Matt

1 Answers

It sounds like you have a "push" system. Plain async code only handles "pull" scenarios.

Your choice is between TPL Dataflow and Rx. I think TPL Dataflow is easier to learn, but since you've already tried it and it won't work for your situation, I would try Rx.

Rx comes at the problem from a very different perspective: it is centered around "streams of events" rather than TPL Dataflow's "mesh of actors". Recent versions of Rx are very async-friendly, so you can use async delegates at several points in your Rx pipeline.

Regarding your API design, both TPL Dataflow and Rx provide interfaces you should implement: IReceivableSourceBlock/ITargetBlock for TPL Dataflow, and IObservable/IObserver for Rx. You can just wire up the implementations to the endpoints of your internal mesh (TPL Dataflow) or query (Rx). That way, your components are just a "block" or "observable/observer/subject" that can be composed in other "meshes" or "queries".

Finally, for your async construction system, you just need to use the factory pattern. Your implementation can call Task.Run to do configuration on a thread pool thread.

146

answered Oct 21 '22 18:10

Stephen Cleary

Related questions
                            
                                C# Tasks - Why a noop line is needed in this case
                            
                                Is CorrelationManager.LogicalOperationStack compatible with Parallel.For, Tasks, Threads, etc
                            
                                TPL Dataflow exception in transform block with bounded capacity
                            
                                SingleProducerConstrained and MaxDegreeOfParallelism
                            
                                Limit parallelism of an Async method and not block a Thread-Pool thread
                            
                                Pattern for writing synchronous and asynchronous methods in libraries and keeping it DRY [duplicate]
                            
                                Create an Awaitable Cold Task
                            
                                How to implement retry logic with Task Parallel Library(TPL) [duplicate]
                            
                                Accuracy of Task.Delay
                            
                                Async CTP - Recommended approach for task scheduling
                            
                                Task Parallel Library - LongRunning task vs Multiple Continuations
                            
                                ConfigureAwait(false) vs setting sync context to null
                            
                                Multithread error not caught by catch
                            
                                async/await with ConfigureAwait's continueOnCapturedContext parameter and SynchronizationContext for asynchronous continuations
                            
                                Where can I find a TPL dataflow version for 4.0?
                            
                                CurrentCulture with async/await, Custom synchronization context
                            
                                Implement Async Interface synchronous [duplicate]
                            
                                Why pass cancellation token to TaskFactory.StartNew?
                            
                                Is it too early to start designing for Task Parallel Library?
                            
                                Observing Task exceptions within a ContinueWith

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Tasks vs. TPL Dataflow vs. Async/Await, which to use when?

Tags:

async-await

task-parallel-library

tpl-dataflow

Matt

People also ask

1 Answers

Stephen Cleary

Recent Activity

Donate For Us