Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nested Parallel performance question

I have question.

Are there any benefits of using Parallel.Invoke inside of another Parallel.ForEach?

Here is my code:

    Parallel.ForEach(yearMonths,
                     () => new List<DJVSStatsCo>(),
                     (yearMonth, loopState, localDjvsStatsCo) =>
                         {
                             var coVintageCounter = 0;
                             var coExitsCounter = 0;
                             var coExtant = 0;

                             Parallel.Invoke(() =>
                                             coVintageCounter = globalData.ValuationEventsPit.
                                                                    Where(x => x.FirstRoundYearMonth <= yearMonth).
                                                                    Select(x => x.CompanyId).Distinct().Count(),
                                             () =>
                                             coExitsCounter = globalData.ValuationEventsPit.
                                                                  Where(x => x.ExitDate != null && x.ExitDateYearMonth == yearMonth).
                                                                  Select(x => x.CompanyId).Distinct().Count(),
                                             () =>
                                             coExtant = globalData.ValuationEventsPit.
                                                            Where(x => x.FirstRoundYearMonth <= yearMonth && (x.ExitDate == null || x.ExitDateYearMonth > yearMonth)).
                                                            Select(x => x.CompanyId).Distinct().Count()
                                 );

                             localDjvsStatsCo.Add(new DJVSStatsCo(yearMonth, coVintageCounter, coExtant, coExitsCounter));

                             return localDjvsStatsCo;
                         },
                     x =>
                         {
                             lock (locker)
                             {
                                 djvsStatsCos.AddRange(x);
                             }
                         });

I have about 50K records and my machine has 2 core processors and calculating calc time I am getting almost the same result. So my question is are there any benefits of using Parallel inside of Parallel? What is the best practice for this?

Thanks a lot.

Sincerely, Vlad.

like image 880
Vlad Bezden Avatar asked Jan 20 '23 06:01

Vlad Bezden


2 Answers

In this case, there's probably no benefit. There could be a benefit in the case where you have relatively few "outer" jobs, but potentially many "inner" jobs.

On the other hand, it also depends on what those three jobs are doing. If they're essentially asynchronous tasks (e.g. on the database) which can be executed in parallel, then sure... but if they're local CPU-intensive tasks, then you're probably just going to give extra work to the scheduler for no real benefit.

Given the look of your code, it strikes me that you could quite possibly benefit from performing a single query (or maybe three) and grouping by yearMonth though...

like image 53
Jon Skeet Avatar answered Jan 28 '23 14:01

Jon Skeet


Since the parallelism of the outer loop already keeps your CPUs busy (50k elements) there is little benefit in introducing parallelism within the loop. In the interest of readability I would remove the Parallel.Invoke call to simplify your code.

like image 36
BrokenGlass Avatar answered Jan 28 '23 14:01

BrokenGlass