I have read through quite a number technical documents either by some of the Microsoft team, or other authors detailing functionality of the new TPL Dataflow library, async/await concurrency frameworks and TPL. However, I have not really come across anything that clearly delineates which to use when. I am aware that each has its own place and applicability but specifically I wonder in regards to the following situation:
I have a data flow model that runs completely in-process. At the top sits a data generation component (A) which generates data and passes it on either via data flow block linkages or through raising events to a processing component (B). Some parts within (B) have to run synchronously while (A) massively benefits from parallelism as most of the processes are I/O or CPU bound (reading binary data from disk, then deserializing and sorting them). In the end the processing component (B) passes on transformed results to (C) for further usage.
I wonder specifically when to use tasks, async/await, and TPL data flow blocks in regards to the following:
Kicking off the data generation component (A). I clearly do not want to lock the gui/dashboard thus this process would have to somewhat run on a different thread/task.
How to call methods within (A), (B), and (C) that are not directly involved in the data generation and processing process but perform configuration work that may possibly take several hundred milliseconds/seconds to return. My hunch is that this is where async/await shines?
The most I struggle with is how to best design the message passing from one component to the next. TPL Dataflow looks very interesting but it is sometimes too slow for my purpose. (Note at the end in regards to performance issues). If not using TPL Dataflow how do I achieve responsiveness and concurrency by in-process inter-task/concurrent data passing? Example, clearly if I raise an event within a task the subscribed event handler runs in the same task instead of being passed to another task, correct? In summary, how can component (A) go about its business after passing on data to component (B) while component (B) retrieves the data and focuses on processing it? Which concurrency model is best used here? I implemented data flow blocks here, but is that truly the best approach?
I guess above points in summary point to my struggle with how to design and implement API type components using standard practice? Should methods be designed async, data inputs as data flow blocks, and data output as either data flow block or event? What is the best approach in general? I am asking because most of the components mentioned above are supposed to work independently, so they can essentially be swapped out or independently altered internally without having to re-write accessors and output.
Note on performance: I mentioned TPL Dataflow blocks are sometimes slow. I deal with a high throughput, low latency type of application and target disk I/O limits and thus tpl dataflow blocks often performed much slower than, for example, a synchronous processing unit. Issue is that I do not know how to embed the process in its own task or concurrent model to achieve something similar than what tpl dataflow blocks already take care of, but without the overhead that comes with tpl df.
Await & Async was built on the Task Parallel Library (TPL) which was introduced in the . NET Framework 4. Their purpose is to enable asynchronous programming.
Async methods are intended to be non-blocking operations. An await expression in an async method doesn't block the current thread while the awaited task is running. Instead, the expression signs up the rest of the method as a continuation and returns control to the caller of the async method.
The TPL Dataflow Library consists of dataflow blocks, which are data structures that buffer and process data. The TPL defines three kinds of dataflow blocks: source blocks, target blocks, and propagator blocks. A source block acts as a source of data and can be read from.
Third Party Liability (TPL) refers to the legal obligation of third parties (for example, certain individuals, entities, insurers, or programs) to pay part or all of the expenditures for medical assistance furnished under a Medicaid state plan.
It sounds like you have a "push" system. Plain async
code only handles "pull" scenarios.
Your choice is between TPL Dataflow and Rx. I think TPL Dataflow is easier to learn, but since you've already tried it and it won't work for your situation, I would try Rx.
Rx comes at the problem from a very different perspective: it is centered around "streams of events" rather than TPL Dataflow's "mesh of actors". Recent versions of Rx are very async
-friendly, so you can use async
delegates at several points in your Rx pipeline.
Regarding your API design, both TPL Dataflow and Rx provide interfaces you should implement: IReceivableSourceBlock
/ITargetBlock
for TPL Dataflow, and IObservable
/IObserver
for Rx. You can just wire up the implementations to the endpoints of your internal mesh (TPL Dataflow) or query (Rx). That way, your components are just a "block" or "observable/observer/subject" that can be composed in other "meshes" or "queries".
Finally, for your async
construction system, you just need to use the factory pattern. Your implementation can call Task.Run
to do configuration on a thread pool thread.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With