I have this code:
# Grab Nutrients.csv from https://data.nal.usda.gov/dataset/usda-branded-food-products-database/resource/c929dc84-1516-4ac7-bbb8-c0c191ca8cec
my @nutrients = "/path/to/Nutrients.csv".IO.lines;
for @nutrients.race {
my @data = $_.split('","');
.say if @data[2] eq "Protein" and @data[4] > 70 and @data[5] ~~ /^g/;
};
Nutrients.csv is a 174 MB file, with lots of rows. Non-trivial stuff is done on every row, but there's no data dependency. However, this takes circa 54s while the non-race version uses 43 seconds, 20% less. Any idea of why that happens? Is the kind of operation done here still too little for data parallelism to take hold? I have seen it only working with very heavy operations, like checking if something is prime. In that case, any ballpark of how much should be done for every piece of data to make data parallelism worth the while?
Assuming that "outperform" is defined as "using less wallclock":
Short answer: when it does.
Longer answer: when the overhead of batching values, distributing over multiple threads and collecting results + the actual CPU that is needed for the work divided by the number of threads, results in a shorter runtime.
Still longer answer: the dispatcher thread needs some CPU to batch up values and hand the work over to a worker thread and then process its result. As long as that amount of CPU is more than the amount of CPU needed to do the work, you will only use one thread (because by the time the dispatcher thread is ready to dispatch, the only worker thread is ready to receive more work). Which means you've made things worse, because the actual work is now still being done by one thread, but you've added a lot of overhead and latency.
So make sure that the amount of work a worker thread needs to do, is big enough so that the dispatcher thread will need to start up another thread for the next piece of work. This can be done by increasing the batch-size. But a bigger batch, also means that the dispatcher thread will need more CPU to create the batch. Which in turn can make the worker thread be ready to receive the next batch, in which case you're back to just having added overhead.
There are still plans to make the batch size adapt itself automatically to the amount of work that a worker thread needs to do. But unfortunately, that will also require quite an extensive reworking of the current implementation of hyper
and race
. So don't expect that any time soon, and definitely not before the Great Dispatcher Overhaul has landed.
Please have a look at:
Raku .hyper() and .race() example not working
The syntax in your example should be:
my @nutrients = "/path/to/Nutrients.csv".IO.lines;
race for @nutrients.race(batch => 1, degree => 2)
{
my @data = $_.split('","');
.say if @data[2] eq "Protein" and @data[4] > 70 and @data[5] ~~ /^g/;
}
The "race" in front of the "for" makes the difference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With