Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel Foreach Memory Issue

I have a file collection (3000 files) in a FileInfoCollection. I want to process all the files by applying some logic which is independent (can be executed in parallel).

 FileInfo[] fileInfoCollection = directory.GetFiles();
 Parallel.ForEach(fileInfoCollection, ProcessWorkerItem);

But after processing about 700 files I am getting an out of memory error. I used Thread-pool before but it was giving same error. If I try to execute without threading (parallel processing) it works fine.

In "ProcessWorkerItem" I am running an algorithm based on the string data of the file. Additionally I use log4net for logging and there are lot of communications with the SQL server in this method.

Here are some info, Files size : 1-2 KB XML files. I read those files and the process is dependent on the content of the file. It is identifying some keywords in the string and generating another XML format. Keywords are in the SQL server database (nearly 2000 words).

like image 599
Jayantha Lal Sirisena Avatar asked May 11 '11 08:05

Jayantha Lal Sirisena


People also ask

Is parallel ForEach faster than ForEach?

The execution of Parallel. Foreach is faster than normal ForEach.

Is parallel ForEach blocking?

No, it doesn't block and returns control immediately. The items to run in parallel are done on background threads.

Does ForEach work in parallel?

ForEach Method (System. Threading. Tasks) Executes a foreach (For Each in Visual Basic) operation in which iterations may run in parallel.

Is parallel ForEach multithreaded?

Parallel. ForEach is like the foreach loop in C#, except the foreach loop runs on a single thread and processing take place sequentially, while the Parallel. ForEach loop runs on multiple threads and the processing takes place in a parallel manner.


1 Answers

Well, what does ProcessWorkerItem do? You may be able to change that to use less memory (e.g. stream the data instead of loading it all in at once) or you may want to explicitly limit the degree of parallelism using this overload and ParallelOptions.MaxDegreeOfParallelism. Basically you want to avoid trying to process all 3000 files at once :) IIRC, Parallel Extensions will "notice" if your tasks appear to be IO bound, and allow more than the normal number to execute at once - which isn't really what you want here, as you're memory bound as well.

like image 76
Jon Skeet Avatar answered Sep 20 '22 00:09

Jon Skeet