When does .race or .hyper outperform non-data-parallelized versions?

Question

I have this code:

# Grab Nutrients.csv from https://data.nal.usda.gov/dataset/usda-branded-food-products-database/resource/c929dc84-1516-4ac7-bbb8-c0c191ca8cec
my @nutrients = "/path/to/Nutrients.csv".IO.lines;
for @nutrients.race {
    my @data = $_.split('","');
    .say if @data[2] eq "Protein" and @data[4] > 70 and @data[5] ~~ /^g/;
};

Nutrients.csv is a 174 MB file, with lots of rows. Non-trivial stuff is done on every row, but there's no data dependency. However, this takes circa 54s while the non-race version uses 43 seconds, 20% less. Any idea of why that happens? Is the kind of operation done here still too little for data parallelism to take hold? I have seen it only working with very heavy operations, like checking if something is prime. In that case, any ballpark of how much should be done for every piece of data to make data parallelism worth the while?

Elizabeth Mattijsen · Accepted Answer

Assuming that "outperform" is defined as "using less wallclock":

Short answer: when it does.

Longer answer: when the overhead of batching values, distributing over multiple threads and collecting results + the actual CPU that is needed for the work divided by the number of threads, results in a shorter runtime.

Still longer answer: the dispatcher thread needs some CPU to batch up values and hand the work over to a worker thread and then process its result. As long as that amount of CPU is more than the amount of CPU needed to do the work, you will only use one thread (because by the time the dispatcher thread is ready to dispatch, the only worker thread is ready to receive more work). Which means you've made things worse, because the actual work is now still being done by one thread, but you've added a lot of overhead and latency.

So make sure that the amount of work a worker thread needs to do, is big enough so that the dispatcher thread will need to start up another thread for the next piece of work. This can be done by increasing the batch-size. But a bigger batch, also means that the dispatcher thread will need more CPU to create the batch. Which in turn can make the worker thread be ready to receive the next batch, in which case you're back to just having added overhead.

There are still plans to make the batch size adapt itself automatically to the amount of work that a worker thread needs to do. But unfortunately, that will also require quite an extensive reworking of the current implementation of hyper and race. So don't expect that any time soon, and definitely not before the Great Dispatcher Overhaul has landed.

user2944647 · Answer

Please have a look at:

Raku .hyper() and .race() example not working

The syntax in your example should be:

my @nutrients = "/path/to/Nutrients.csv".IO.lines;
race for @nutrients.race(batch => 1, degree => 2) 
{
     my @data = $_.split('","');
     .say if @data[2] eq "Protein" and @data[4] > 70 and @data[5] ~~ /^g/;
}

The "race" in front of the "for" makes the difference.

When does .race or .hyper outperform non-data-parallelized versions?

Tags:

concurrency

raku

jjmerelo

2 Answers

Elizabeth Mattijsen

user2944647

Recent Activity

Donate For Us

When does .race or .hyper outperform non-data-parallelized versions?

Tags:

concurrency

raku

jjmerelo

2 Answers

Elizabeth Mattijsen

user2944647

Related questions

Recent Activity

Donate For Us