Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens when a declarator (my/state) is in a for block?

Tags:

scope

raku

The following blocks run a loop assigning the topic to a variable $var:

  • The first one the my $var; is outside the loop
  • The second the my $var; is inside the loop
  • Lastly the state $var; is inside the loop
my $limit=10_000_000;
{
    my $var;
    for ^$limit { $var =$_; }
    say now  - ENTER now;
}
{
    for ^$limit { my $var; $var=$_; }
    say now  - ENTER now;
}
{
    for ^$limit { state $var; $var=$_; }
    say now  - ENTER now;
}

A sample output durations (seconds) of each block are as follows:

0.5938845                                                                                                                                 
1.8251226                                                                                                                                 
2.60700803  

The docs at https://docs.perl6.org/syntax/state motion state variables have the same lexical scoping as my. Functionally code block 1 and block 3 would achieve the same persistent storage across multiple calls to the respective loop block.

Why does the state ( and the inner my) version take so much more time? What else is it doing?

Edit: Similar to @HåkonHægland's comment,if I cut and paste the above code so to run each block three times in total the timing changes significantly for the my $var outside the loop(the first case):

0.600303                                                                                                                                  
1.7917011                                                                                                                                 
2.6640811                                                                                                                                 

1.67793597                                                                                                                                
1.79197091                                                                                                                                
2.6816156                                                                                                                                 

1.795679                                                                                                                                  
1.81233942                                                                                                                                
2.77486777
like image 797
drclaw Avatar asked Apr 08 '19 05:04

drclaw


1 Answers

Short version: in a world without any runtime optimization (type specialization, JIT, and so forth), the timings would match your expectations. The timings here are influenced by how well the optimizer deals with each example.

First of all, it's interesting to run the code without any kind of runtime optimization. In my (rather slow) VM on the box I'm currently on, sticking MVM_SPESH_DISABLE=1 into the environment results in these timings:

13.92366942
16.235372
14.4329288

These make some kind of intuitive sense:

  • In the first case, we have a simple lexical variable declared in the outer scope of the block
  • In the second case, we have to allocate, and then garbage collect, an extra Scalar allocation every time around the loop, which accounts for the extra time
  • In the third case, we're using the state variable. A state variable is stored in the code object of the closure, and then copied into the call frame at entry time. That's cheaper than allocating a new Scalar every time, but still a little bit more work than not having to do that operation at all.

Next, let's run 3 programs with the optimizer enabled, each example in its own isolated program.

  • The first comes out at 0.86298831, a factor of 16 faster. Go optimizer! It has inlined the loop body.
  • The second comes out at 1.2288566, a factor of 13 faster. Not too shabby either. It has again inlined the loop body. (This case will also become rather cheaper in the future, once the escape analyzer is smart enough to eliminate the Scalar allocation.)
  • The third comes out at 2.0695035, a factor of 7 faster. That's comparatively unimpressive (even if still quite an improvement), and the major reason is that it has not inlined the loop body. Why? Because it doesn't know how to inline code that uses state variables yet. (How to see this: run with MVM_SPESH_INLINE_LOG=1 in the environment, and among the output is: Can NOT inline (1) with bytecode size 78 into (3): cannot inline code that declares a state variable.)

In short, the dominating factor here is the inlining of the loop body, and with state variables that is presently not possible.

It's not immediately clear why the optimizer does worse at the case with the outer declaration of $var when that isn't the first loop in the program; that feels more like a bug than a reasonable case of "this feature isn't optimized well yet". In its slight defense, it still consistently manages to deliver a big improvement, even when not so big as might be desired!

like image 154
Jonathan Worthington Avatar answered Feb 07 '23 07:02

Jonathan Worthington