Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prove that using a range operator in a loop does not use additional memory

Tags:

loops

perl

The current documentation for the Range operator .. states that it will not burn up memory for counting loops:

... The range operator is useful for writing foreach (1..10) loops and for doing slice operations on arrays. In the current implementation, no temporary array is created when the range operator is used as the expression in foreach loops, but older versions of Perl might burn a lot of memory when you write something like this:

    1.   for (1 .. 1_000_000) {
    2.       # code
    3.   }

Because of the aforementioned early implementations of for (MIN .. MAX), I still come by experts who are wary of using counting loops because they believe it is equivalent to:

my @temp_array = (MIN .. MAX);       # Needlessly using up memory
for (@temp_array) {

Versus the more logical and memory efficient:

for ($_ = MIN; $_ <= MAX; $_++) {    # Logical counting from MIN to MAX

Questions:

  • Is there a way that one could go about proving that a counting loop does not waste memory?

  • Does anyone know which versions of Perl had the memory issue and when it was fixed?

I'm able to prove to myself that counting loops don't waste memory using the below one-liner which would certainly crash my system if it was actually creating a temporary array. However, it would be nice if there was more conclusive information on the subject so that we could put this old-wives tale to rest.

$ perl -e 'for (1 .. 1_000_000_000_000_000) { print "$_\n"; last if $_ == 5 }'
1
2
3
4
5

Solution

Each of the three below answers go part of the way to explaining this issue:

  1. ikegami's answer thoroughly breaks down different types loops and demonstrates how counting loops differ on the front and back end. &star;
  2. friedo's answer shows how to use top to monitor memory usage.
  3. Borodin's answer addresses the second part of my query about when this no longer became an issue:
    • perlop v5.4_68 (released on 23 June 1998) warns a temporary array is used.
    • perlop v5.4_69 (released on 29 June 1998) states a temporary array is no longer used.
    • perldelta v5.4_71 (released on 9 July 1998) states that counting loops are optimized.

I might do some specific version testing at some point, but given this is apparently a 16 year old issue, I'm confident that the warning in perlop can be put to rest.

like image 265
Miller Avatar asked Aug 09 '14 00:08

Miller


1 Answers

First of all, here's a list of different types of for loops and the optimizations that can be applied. All of these are present in every version of Perl from 5.6 to 5.20 (present) inclusive, and I believe it's comprehensive.

  • for (EXPR; EXPR; EXPR)
    ⇒ "C-style for loop", an augmented while loop.
  • for (EXPRX..EXPRY)
    ⇒ A range and nothing else is optimized into a counting loop.
  • for (@ARRAY)
    ⇒ An array and nothing else is accessed directly rather than being flattened.
  • for (reverse LIST)
    ⇒ None of the above optimizations apply, but the list is traversed in reverse rather than being reversed.
  • for (LIST)
    ⇒ In a generic foreach loop, the LIST expression is evaluated before the loop starts.

When CONSTX..CONSTY is flattened (i.e. anywhere other than in for (CONSTX..CONSTY)), it is flattened at compile-time rather than run-time.


Black box proof

Baseline memory usage:

$ perl -e'system(ps, ho, rss, 0+$$);'
 1540         # 1.5 MiB

The general case flattens.

$ perl -e'$y=2_000_000; for ((),1..$y) { system(ps, ho, rss, 0+$$); last }'
80208         # 78 MiB

Or worse. (It flattens into an array at compile-time in addition to the normal stack usage.)

$ perl -e'for ((),1..2_000_000) { system(ps, ho, rss, 0+$$); last }'
143224        # 140 MiB

for (CONST..CONST) doesn't flatten.

$ perl -e'for (1..2_000_000) { system(ps, ho, rss, 0+$$); last }'
 1540         # 1.5 MiB

In fact, for (EXPR..EXPR) in general doesn't flatten.

$ perl -e'$y=2_000_000; for (1..$y) { system(ps, ho, rss, 0+$$); last }'
 1540         # 1.5 MiB

Even without tools, you could tell the the difference in compilation time.

$ time perl -c -e'1 for 1..2_000_000'
-e syntax OK

real    0m0.010s
user    0m0.004s
sys     0m0.000s

$ time perl -c -e'1 for (),1..2_000_000'
-e syntax OK

real    0m1.197s
user    0m0.952s
sys     0m0.232s

White box proof

The unoptimized case uses a range operator in list context. Full list in memory.

$ perl -MO=Concise,-exec -e'$y=1_000_000; 1 for (),1..$y;'
...
8  <|> range(other->9)[t3] lK/1                      <-- Range operator
9      <#> gvsv[*y] s
a      <1> flop lKM
           goto b
i  <$> const[IV 1] s
j  <1> flip[t4] lK/LINENUM
b  <#> gv[*_] s
c  <{> enteriter(next->d last->g redo->d) lK/8       <-- No S
...

This is what a range flattened at compile-time looks like:

$ perl -MO=Concise,-exec -e'1 for (),1..1_000_000;'
...
4  <$> const[AV ] s                                  <-- Constant array
5  <1> rv2av lKPM/1
6  <#> gv[*_] s
7  <{> enteriter(next->8 last->b redo->8) lK/8       <-- No S
...

You can see that for (CONST..CONST) creates an enteriter with the "S" flag. On enteriter, the "S" flag means it's a counting loop.

$ perl -MO=Concise,-exec -e'1 for 1..1_000_000;'
...
4  <$> const[IV 1] s
5  <$> const[IV 1000000] s
6  <#> gv[*_] s
7  <{> enteriter(next->8 last->b redo->8) lKS/8       <-- S
...

Same for for (EXPR..EXPR) in general.

$ perl -MO=Concise,-exec -e'$y=1_000_000; 1 for 1..$y;'
...
8  <$> const[IV 1] s
9  <#> gvsv[*y] s
a  <#> gv[*_] s
b  <{> enteriter(next->c last->f redo->c) lKS/8       <-- S
...

Even for (@a) isn't flattened!

$ perl -MO=Concise,-exec -e'1 for @a;'
...
4  <#> gv[*a] s
5  <1> rv2av[t2] sKRM/1
6  <#> gv[*_] s
7  <{> enteriter(next->8 last->b redo->8) lKS/8       <-- S
...

Double-check

$ perl -MO=Concise,-exec -e'1 for (),@a;'
...
4  <#> gv[*a] s
5  <1> rv2av[t2] lKM/1
6  <#> gv[*_] s
7  <{> enteriter(next->8 last->b redo->8) lK/8        <-- No S
...

Looking up the code for the "S" flag will confirm all of this.

  • pp_enteriter (PL_op->op_flags & OPf_STACKED checks for "S")
  • pp_iter.
like image 107
ikegami Avatar answered Oct 18 '22 10:10

ikegami