Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel.Foreach spawning way too many threads

The problem

Although the code about which I will talk here I wrote in F#, it is based on the .NET 4 framework, not specifically depending on any particularity of F# (at least it seems so!).

I have some pieces of data on my disk that I should update from the network, saving the latest version to the disk:

type MyData =
    { field1 : int;
      field2 : float }

type MyDataGroup =
    { Data : MyData[];
      Id : int }

// load : int -> MyDataGroup
let load dataId =
    let data = ... // reads from disk
    { Data = data;
      Id = dataId }

// update : MyDataGroup -> MyDataGroup
let update dg =
    let newData = ... // reads from the network and process
                      // newData : MyData[]

    { dg with Data = dg.Data
                     |> Seq.ofArray
                     |> Seq.append newData
                     |> processDataSomehow
                     |> Seq.toArray }

// save : MyDataGroup -> unit
let save dg = ... // writes to the disk

let loadAndSaveAndUpdate = load >> update >> save

The problem is that to loadAndSaveAndUpdate all my data, I would have to execute the function many times:

{1 .. 5000} |> loadAndSaveAndUpdate

Each step would do

  • some disk IO,
  • some data crunching,
  • some network IO (with possibility of lots of latency),
  • more data crunching,
  • and some disk IO.

Wouldn't it be nice to have this done in parallel, to some degree? Unfortunately, none of my reading and parsing functions are "async-workflows-ready".

The first (not very good) solutions I came up with

Tasks

The first thing I've done was to set up a Task[] and start them all:

let createTask id = new Task(fun _ -> loadAndUpdateAndSave id)
let tasks = {1 .. 5000}
            |> Seq.map createTask
            |> Seq.toArray

tasks |> Array.iter (fun x -> x.Start())
Task.WaitAll(tasks)

Then I hit CTRL+ESC just to see how many threads it was using. 15, 17, ..., 35, ..., 170, ... until killed the application! Something was going wrong.

Parallel

I did almost the same thing but using Parallel.ForEach(...) and the results were the same: lots and lots and lots of threads.

A solution that works... kind of

Then I decided to start only n threads, Task.WaitAll(of them), then other n, until there were no more tasks available.

This works, but the problem is that when it has finished processing, say, n-1 tasks, it will wait, wait, wait for the damn last Task that insist on blocking due to lots of network latency. This is not good!

So, how would you attack this problem? I'd appreciate to view different solutions, involving either Async Workflows (and in this case how to adapt my non-async functions), Parallel Extensions, weird parallel patterns, etc.

Thanks.

like image 729
Bruno Reis Avatar asked Jan 04 '10 22:01

Bruno Reis


People also ask

Is parallel ForEach multiple threads?

Parallel. ForEach is like the foreach loop in C#, except the foreach loop runs on a single thread and processing take place sequentially, while the Parallel. ForEach loop runs on multiple threads and the processing takes place in a parallel manner.

Is a DbContext per thread in parallel ForEach safe?

I have researched this, and I agree that DbContext is not thread-safe. The pattern I propose does use multiple threads, but a single DbContext is only every accessed by a single thread in a single-threaded fashion.

Is parallel ForEach blocking?

No, it doesn't block and returns control immediately. The items to run in parallel are done on background threads.

Should you use parallel ForEach?

The short answer is no, you should not just use Parallel. ForEach or related constructs on each loop that you can. Parallel has some overhead, which is not justified in loops with few, fast iterations. Also, break is significantly more complex inside these loops.


1 Answers

ParallelOptions.MaxDegreeOfParallelism limits the number of concurrent operations run by Parallel method calls

like image 108
Marc Bate Avatar answered Sep 19 '22 15:09

Marc Bate