I try to create well-designed TPL dataflow pipeline with optimal using of system resources. My project is a HTML parser that adds parsed values into SQL Server DB. I already have all methods of my future pipeline, and now my question is what is the optimal way to place them in Dataflow blocks, and how much blocks i should use? Some of methods are CPU-bound, and some of them - I/O-bound(loading from Internet, SQL Server DB queries). For now I think that placing each I/O operation in separate block is the right way like on this scheme:
What are the basic rules of designing pipelines in that case?
TPL Dataflow is a data processing library from Microsoft that came out years ago. It consists of different "blocks" that you compose together to make a pipeline. Blocks correspond to stages in your pipeline. If you didn't read the first post in the series then that might not be a bad idea before you read on.
The dataflow components build on the types and scheduling infrastructure of the TPL and integrate with the C#, Visual Basic, and F# language support for asynchronous programming.
The TransformBlock<TInput,TOutput> object specifies a Func<T,TResult> object to perform work when the blocks receive data. The ActionBlock<TInput> object uses a lambda expression to print to the console the number of zero bytes that are read. C# Copy.
One way to choose how to divide the blocks is to decide which parts you want to scale independently of the others. A good starting point is to divide the CPU-bound portions from the I/O-bound portions. I'd consider combining the last two blocks, since they are both I/O-bound (presumably to the same database).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With