The following blocks run a loop assigning the topic to a variable $var
:
my $var;
is outside the loopmy $var;
is inside the loopstate $var;
is inside the loopmy $limit=10_000_000;
{
my $var;
for ^$limit { $var =$_; }
say now - ENTER now;
}
{
for ^$limit { my $var; $var=$_; }
say now - ENTER now;
}
{
for ^$limit { state $var; $var=$_; }
say now - ENTER now;
}
A sample output durations (seconds) of each block are as follows:
0.5938845
1.8251226
2.60700803
The docs at https://docs.perl6.org/syntax/state motion state
variables have the same lexical scoping as my
. Functionally code block 1 and block 3 would achieve the same persistent storage across multiple calls to the respective loop block.
Why does the state
( and the inner my
) version take so much more time? What else is it doing?
Edit:
Similar to @HåkonHægland's comment,if I cut and paste the above code so to run each block three times in total the timing changes significantly for the my $var
outside the loop(the first case):
0.600303
1.7917011
2.6640811
1.67793597
1.79197091
2.6816156
1.795679
1.81233942
2.77486777
Short version: in a world without any runtime optimization (type specialization, JIT, and so forth), the timings would match your expectations. The timings here are influenced by how well the optimizer deals with each example.
First of all, it's interesting to run the code without any kind of runtime optimization. In my (rather slow) VM on the box I'm currently on, sticking MVM_SPESH_DISABLE=1
into the environment results in these timings:
13.92366942
16.235372
14.4329288
These make some kind of intuitive sense:
Scalar
allocation every time around the loop, which accounts for the extra timestate
variable. A state
variable is stored in the code object of the closure, and then copied into the call frame at entry time. That's cheaper than allocating a new Scalar
every time, but still a little bit more work than not having to do that operation at all.Next, let's run 3 programs with the optimizer enabled, each example in its own isolated program.
0.86298831
, a factor of 16 faster. Go optimizer! It has inlined the loop body.1.2288566
, a factor of 13 faster. Not too shabby either. It has again inlined the loop body. (This case will also become rather cheaper in the future, once the escape analyzer is smart enough to eliminate the Scalar
allocation.)2.0695035
, a factor of 7 faster. That's comparatively unimpressive (even if still quite an improvement), and the major reason is that it has not inlined the loop body. Why? Because it doesn't know how to inline code that uses state variables yet. (How to see this: run with MVM_SPESH_INLINE_LOG=1
in the environment, and among the output is: Can NOT inline (1) with bytecode size 78 into (3): cannot inline code that declares a state variable
.)In short, the dominating factor here is the inlining of the loop body, and with state variables that is presently not possible.
It's not immediately clear why the optimizer does worse at the case with the outer declaration of $var
when that isn't the first loop in the program; that feels more like a bug than a reasonable case of "this feature isn't optimized well yet". In its slight defense, it still consistently manages to deliver a big improvement, even when not so big as might be desired!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With