Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Raku parallel/functional methods

Tags:

raku

I am pretty new to Raku and I have a questions to functional methods, in particular with reduce. I originally had the method:

sub standardab{
  my $mittel = mittel(@_);
  my $foo = 0;
  for @_ {
    $foo += ($_ - $mittel)**2;
  }
  $foo = sqrt($foo/(@_.elems));
}

and it worked fine. Then I started to use reduce:

sub standardab{
    my $mittel = mittel(@_);
    my $foo = 0;
    $foo = @_.reduce({$^a + ($^b-$mittel)**2});
    $foo = sqrt($foo/(@_.elems));
}

my execution time doubled (I am applying this to roughly 1000 elements) and the solution differed by 0.004 (i guess rounding error). If I am using

.race.reduce(...)

my execution time is 4 times higher than with the original sequential code. Can someone tell me the reason for this? I thought about parallelism initialization time, but - as I said - i am applying this to 1000 elements and if i change other for loops in my code to reduce it gets even slower!

Thanks for your help

like image 644
Sprinklerkopf Avatar asked Apr 23 '20 16:04

Sprinklerkopf


People also ask

What is an routine in raku?

Routines are one of the means Raku has to reuse code. They come in several forms, most notably methods, which belong in classes and roles and are associated with an object; and functions (also called subroutines or sub s, for short), which can be called independently of objects.

Can I run things in parallel in raku?

Thankfully, Raku has features that will enable you to run things in parallel. At this stage, it is important to note that parallelism can mean one of two things: Task Parallelism: Two (or more) independent expressions running in parallel. Data Parallelism: A single expression iterating over a list of elements in parallel.

What are the conditional and looping constructs in raku?

Raku has many conditional and looping constructs. 5.1. if The code runs only if a condition has been met; i.e., an expression evaluates to True. my$age= 19; if$age> 18{ say'Welcome'} In Raku, we can invert the code and the condition. Even if the code and the condition have been inverted, the condition is always evaluated first.

How to enforce a function to act as a mutator in raku?

In order to enforce a function to act as a mutator, we use .=instead of . (#4) (Line 9 of the script) 5. Loops and conditions Raku has many conditional and looping constructs. 5.1. if The code runs only if a condition has been met; i.e., an expression evaluates to True. my$age= 19; if$age> 18{ say'Welcome'}


1 Answers

Summary

  • In general, reduce and for do different things, and they are doing different things in your code. For example, compared with your for code, your reduce code involves twice as many arguments being passed and is doing one less iteration. I think that's likely at the root of the 0.004 difference.

  • Even if your for and reduce code did the same thing, an optimized version of such reduce code would never be faster than an equally optimized version of equivalent for code.

  • I thought that race didn't automatically parallelize reduce due to reduce's nature. (Though I see per your and @user0721090601's comment I'm wrong.) But it will incur overhead -- currently a lot.

  • You could use race to parallelize your for loop instead, if it's slightly rewritten. That might speed it up.

On the difference between your for and reduce code

Here's the difference I meant:

say do for    <a b c d>  { $^a }       # (a b c d)      (4 iterations)

say do reduce <a b c d>: { $^a, $^b }  # (((a b) c) d)  (3 iterations)

For more details of their operation, see their respective doc (for, reduce).

You haven't shared your data, but I will presume that the for and/or reduce computations involve Nums (floats). Addition of floats isn't commutative, so you may well get (typically small) discrepancies if the additions end up happening in a different order.

I presume that explains the 0.004 difference.

On your sequential reduce being 2X slower than your for

my execution time doubled (I am applying this to roughly 1000 elements)

First, your reduce code is different, as explained above. There are general abstract differences (eg taking two arguments per call instead of your for block's one) and perhaps your specific data leads to fundamental numeric computation differences (perhaps your for loop computation is primarily integer or float math while your reduce is primarily rational?). That might explain the execution time difference, or some of it.

Another part of it may be the difference between, on the one hand, a reduce, which will by default compile into calls of a closure, with call overhead, and two arguments per call, and temporary memory storing intermediate results, and, on the other, a for which will by default compile into direct iteration, with the {...} being just inlined code rather than a call of a closure. (That said, it's possible a reduce will sometimes compile to inlined code; and it may even already be that way for your code.)

More generally, Rakudo optimization effort is still in its relatively early days. Most of it has been generic, speeding up all code. Where effort has been applied to particular constructs, the most widely used constructs have gotten the attention so far, and for is widely used and reduce less so. So some or all the difference may just be that reduce is poorly optimized.

On reduce with race

my execution time [for .race.reduce(...)] is 4 times higher than with the original sequential code

I didn't think reduce would be automatically parallelizable with race. Per its doc, reduce works by "iteratively applying a function which knows how to combine two values", and one argument in each iteration is the result of the previous iteration. So it seemed to me it must be done sequentially.

(I see in the comments that I'm misunderstanding what could be done by a compiler with a reduction. Perhaps this is if it's a commutative operation?)

In summary, your code is incurring raceing's overhead without gaining any benefit.

On race in general

Let's say you're using some operation that is parallelizable with race.

First, as you noted, race incurs overhead. There'll be an initialization and teardown cost, at least some of which is paid repeatedly for each evaluation of an overall statement/expression that's being raced.

Second, at least for now, race means use of threads running on CPU cores. For some payloads that can yield a useful benefit despite any initialization and teardown costs. But it will, at best, be a speed up equal to the number of cores.

(One day it should be possible for compiler implementors to spot that a raced for loop is simple enough to be run on a GPU rather than a CPU, and go ahead and send it to a GPU to achieve a spectacular speed up.)

Third, if you literally write .race.foo... you'll get default settings for some tunable aspects of the racing. The defaults are almost certainly not optimal and may be way off.

The currently tunable settings are :batch and :degree. See their doc for more details.

More generally, whether parallelization speeds up code depends on the details of a specific use case such as the data and hardware in use.

On using race with for

If you rewrite your code a bit you can race your for:

$foo = sum do race for @_ { ($_ - $mittel)**2 } 

To apply tuning you must repeat the race as a method, for example:

$foo = sum do race for @_.race(:degree(8)) { ($_ - $mittel)**2 } 
like image 64
raiph Avatar answered Dec 23 '22 21:12

raiph