Raku parallel/functional methods

Tags:

raku

I am pretty new to Raku and I have a questions to functional methods, in particular with reduce. I originally had the method:

sub standardab{
  my $mittel = mittel(@_);
  my $foo = 0;
  for @_ {
    $foo += ($_ - $mittel)**2;
  }
  $foo = sqrt($foo/(@_.elems));
}

and it worked fine. Then I started to use reduce:

sub standardab{
    my $mittel = mittel(@_);
    my $foo = 0;
    $foo = @_.reduce({$^a + ($^b-$mittel)**2});
    $foo = sqrt($foo/(@_.elems));
}

my execution time doubled (I am applying this to roughly 1000 elements) and the solution differed by 0.004 (i guess rounding error). If I am using

.race.reduce(...)

my execution time is 4 times higher than with the original sequential code. Can someone tell me the reason for this? I thought about parallelism initialization time, but - as I said - i am applying this to 1000 elements and if i change other for loops in my code to reduce it gets even slower!

Thanks for your help

644

asked Apr 23 '20 16:04

Sprinklerkopf

1 Answers

Summary

In general, reduce and for do different things, and they are doing different things in your code. For example, compared with your for code, your reduce code involves twice as many arguments being passed and is doing one less iteration. I think that's likely at the root of the 0.004 difference.
Even if your for and reduce code did the same thing, an optimized version of such reduce code would never be faster than an equally optimized version of equivalent for code.
I thought that race didn't automatically parallelize reduce due to reduce's nature. (Though I see per your and @user0721090601's comment I'm wrong.) But it will incur overhead -- currently a lot.
You could use race to parallelize your for loop instead, if it's slightly rewritten. That might speed it up.

On the difference between your `for` and `reduce` code

Here's the difference I meant:

say do for    <a b c d>  { $^a }       # (a b c d)      (4 iterations)

say do reduce <a b c d>: { $^a, $^b }  # (((a b) c) d)  (3 iterations)

For more details of their operation, see their respective doc (for, reduce).

You haven't shared your data, but I will presume that the for and/or reduce computations involve Nums (floats). Addition of floats isn't commutative, so you may well get (typically small) discrepancies if the additions end up happening in a different order.

I presume that explains the 0.004 difference.

On your sequential `reduce` being 2X slower than your `for`

my execution time doubled (I am applying this to roughly 1000 elements)

First, your reduce code is different, as explained above. There are general abstract differences (eg taking two arguments per call instead of your for block's one) and perhaps your specific data leads to fundamental numeric computation differences (perhaps your for loop computation is primarily integer or float math while your reduce is primarily rational?). That might explain the execution time difference, or some of it.

Another part of it may be the difference between, on the one hand, a reduce, which will by default compile into calls of a closure, with call overhead, and two arguments per call, and temporary memory storing intermediate results, and, on the other, a for which will by default compile into direct iteration, with the {...} being just inlined code rather than a call of a closure. (That said, it's possible a reduce will sometimes compile to inlined code; and it may even already be that way for your code.)

More generally, Rakudo optimization effort is still in its relatively early days. Most of it has been generic, speeding up all code. Where effort has been applied to particular constructs, the most widely used constructs have gotten the attention so far, and for is widely used and reduce less so. So some or all the difference may just be that reduce is poorly optimized.

On `reduce` with `race`

my execution time [for .race.reduce(...)] is 4 times higher than with the original sequential code

I didn't think reduce would be automatically parallelizable with race. Per its doc, reduce works by "iteratively applying a function which knows how to combine two values", and one argument in each iteration is the result of the previous iteration. So it seemed to me it must be done sequentially.

(I see in the comments that I'm misunderstanding what could be done by a compiler with a reduction. Perhaps this is if it's a commutative operation?)

In summary, your code is incurring raceing's overhead without gaining any benefit.

On `race` in general

Let's say you're using some operation that is parallelizable with race.

First, as you noted, race incurs overhead. There'll be an initialization and teardown cost, at least some of which is paid repeatedly for each evaluation of an overall statement/expression that's being raced.

Second, at least for now, race means use of threads running on CPU cores. For some payloads that can yield a useful benefit despite any initialization and teardown costs. But it will, at best, be a speed up equal to the number of cores.

(One day it should be possible for compiler implementors to spot that a raced for loop is simple enough to be run on a GPU rather than a CPU, and go ahead and send it to a GPU to achieve a spectacular speed up.)

Third, if you literally write .race.foo... you'll get default settings for some tunable aspects of the racing. The defaults are almost certainly not optimal and may be way off.

The currently tunable settings are :batch and :degree. See their doc for more details.

More generally, whether parallelization speeds up code depends on the details of a specific use case such as the data and hardware in use.

On using `race` with `for`

If you rewrite your code a bit you can race your for:

$foo = sum do race for @_ { ($_ - $mittel)**2 }

To apply tuning you must repeat the race as a method, for example:

$foo = sum do race for @_.race(:degree(8)) { ($_ - $mittel)**2 }

answered Dec 23 '22 21:12

raiph

Related questions
                            
                                unable to use Sigilless variables with WHERE clause in CLASS?
                            
                                Keeping default values for nested named parameters
                            
                                Dynamic variables, CALLERS, Scalars, and assignment
                            
                                Refactoring a recursive function into iterative in a coin-change type of problem
                            
                                Junction ~~ Junction behavior
                            
                                Getting absolute path to perl executable for the current process
                            
                                Strange behavior of Buf.subbuf in Perl 6
                            
                                Is there a REPL shell for perl6/raku?
                            
                                Perl6: Pushing an array to an array of arrays with one element seems not to work as expected
                            
                                Can I return multiple Pairs from a map feeding into a hash?
                            
                                How to exit a promise from within a promise?
                            
                                perl6 interpolate array in match for AND, OR, NOT functions
                            
                                How to use abstract multi methods containing a where?
                            
                                Perl6 Terminal::Print how to prompt the user for input text?
                            
                                How to insert an array into an array of arrays?
                            
                                Using $/ is not exactly the same as using any other variable in grammar actions
                            
                                Passing variables in proto regex with Perl 6 grammar
                            
                                Apply a proxy using traits
                            
                                Does Raku always parse?
                            
                                Signature restriction in roles in raku

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Raku parallel/functional methods

Tags:

raku

Sprinklerkopf

People also ask

1 Answers

Summary

On the difference between your `for` and `reduce` code

On your sequential `reduce` being 2X slower than your `for`

On `reduce` with `race`

On `race` in general

On using `race` with `for`

raiph

Recent Activity

Donate For Us

Raku parallel/functional methods

Tags:

raku

Sprinklerkopf

People also ask

1 Answers

Summary

On the difference between your for and reduce code

On your sequential reduce being 2X slower than your for

On reduce with race

On race in general

On using race with for

raiph

Related questions

Recent Activity

Donate For Us

On the difference between your `for` and `reduce` code

On your sequential `reduce` being 2X slower than your `for`

On `reduce` with `race`

On `race` in general

On using `race` with `for`