Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TPL Dataflow pipeline design basics

I try to create well-designed TPL dataflow pipeline with optimal using of system resources. My project is a HTML parser that adds parsed values into SQL Server DB. I already have all methods of my future pipeline, and now my question is what is the optimal way to place them in Dataflow blocks, and how much blocks i should use? Some of methods are CPU-bound, and some of them - I/O-bound(loading from Internet, SQL Server DB queries). For now I think that placing each I/O operation in separate block is the right way like on this scheme: TPL Dataflow pipeline

What are the basic rules of designing pipelines in that case?

like image 738
AsValeO Avatar asked Mar 10 '14 10:03

AsValeO


People also ask

What is TPL data flow?

TPL Dataflow is a data processing library from Microsoft that came out years ago. It consists of different "blocks" that you compose together to make a pipeline. Blocks correspond to stages in your pipeline. If you didn't read the first post in the series then that might not be a bad idea before you read on.

What is data flow in C#?

The dataflow components build on the types and scheduling infrastructure of the TPL and integrate with the C#, Visual Basic, and F# language support for asynchronous programming.

What is TransformBlock C#?

The TransformBlock<TInput,TOutput> object specifies a Func<T,TResult> object to perform work when the blocks receive data. The ActionBlock<TInput> object uses a lambda expression to print to the console the number of zero bytes that are read. C# Copy.


1 Answers

One way to choose how to divide the blocks is to decide which parts you want to scale independently of the others. A good starting point is to divide the CPU-bound portions from the I/O-bound portions. I'd consider combining the last two blocks, since they are both I/O-bound (presumably to the same database).

like image 99
Stephen Cleary Avatar answered Oct 20 '22 15:10

Stephen Cleary