Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is "race" worthwhile in Perl 6?

race divides operations on an iterable automatically into threads. For instance,

(Bool.roll xx 2000).race.sum

would automatically divide the sum of the 2000-long array into 4 threads. However, benchmarks show that this is much slower than if race were not employed. This happens even if you make the array bigger. This happens even as the non-autothreaded version gets faster and faster with each version. (Auto-threading also gets faster, but is still twice as slow as not using it.)

So the question is: what is the minimum size of the atomic operation that is worthwhile to use? Is the overhead added to the sequential operation fixed or can it be decreased somehow?

Update: in fact, performance of hyper (similar to race, but with guaranteed ordered results) seems to be getting worse with time, at least for small sizes which are nonetheless integer multiples of the default batch size (64). Same happens with race

like image 521
jjmerelo Avatar asked Jul 22 '18 07:07

jjmerelo


1 Answers

The short answer: .sum isn't smart enough to calculate sums in batches.

So what you're effectively doing in this benchmark, is to set up a HyperSeq / RaceSeq but then not doing any parallel processing:

dd (Bool.roll xx 2000).race;
# RaceSeq.new(configuration => HyperConfiguration.new(batch => 64, degree => 4))

So you've been measuring .hyper / .race overhead. You see, at the moment, only .map and .grep have been implemented on HyperSeq / RaceSeq. If you give that something to do, like:

# find the 1000th prime number in a single thread
$ time perl6 -e 'say (^Inf).grep( *.is-prime ).skip(999).head'
real    0m1.731s
user    0m1.780s
sys     0m0.043s

# find the 1000th prime number concurrently
$ time perl6 -e 'say (^Inf).hyper.grep( *.is-prime ).skip(999).head'
real    0m0.809s
user    0m2.048s
sys     0m0.060s

As you can see, in this (small) example, the concurrent version is more than 2x as fast as the non-concurrent one. But uses more CPU.

Since .hyper and .race got to work correctly, performance has slightly improved, as you can see in this graph.

Other functions, such as .sum could be implemented for .hyper / .race. However, I would hold off on that at the moment, as we will need a small refactor of the way we do .hyper and .race: at the moment, a batch can not communicate back to the "supervisor" how fast it has finished its job. The supervisor needs that information if we want to allow it to adjust e.g. batch-size, if it finds out that the default batch-size is too small and we have too much overhead.

like image 141
Elizabeth Mattijsen Avatar answered Nov 16 '22 01:11

Elizabeth Mattijsen