This is problem36 from the Euler Project. Sum all of the numbers below a million that are palindromic in base 2 and base 10. I'd originally tried solving it in a more functional style. This runs in just under 6 seconds. <pre class="prettyprint"><code>[1..1_000_000] .grep( * !%% 2 ) .grep( -> $x { $x == $x.flip } ) .grep( -> $y { $y.base(2) == $y.base(2).flip } ) .sum.say </code></pre> Surprisingly this took 12 seconds even though I'm only generating odd numbers and therefore skipping the test for even. <pre class="prettyprint"><code>(1,3 ... 1_000_000) .grep( -> $x { $x == $x.flip } ) .grep( -> $y { $y.base(2) == $y.base(2).flip } ) .sum.say </code></pre> This runs in about 3 seconds. <pre class="prettyprint"><code>my @pals; for (1,3 ... 1_000_000) -> $x { next unless $x == $x.flip; next unless $x.base(2) == $x.base(2).flip; @pals.push($x); } say [+] @pals; </code></pre> I also noted that there is a significant difference between using <pre class="prettyprint"><code>for (1,3 ... 1_000_000) -> $x { ... </code></pre> and <pre class="prettyprint"><code>for [1,3 ... 1_000_000] -> $x { ... </code></pre> Anyone know why the streaming versions are so much slower than the iterative one? And, why would those two for loops be so different in performance?

The construct <code>[...]</code> is an array composer. It eagerly iterates the iterable found within it, and stores each value into the array. Only then do we proceed to do the iteration. That results in far more memory allocation and is less cache-friendly. By contrast, parentheses do nothing (aside from grouping, but they don't add any semantics beyond that). Thus: <pre class="prettyprint"><code>[1..1_000_000] .grep( * !%% 2 ) .grep( -> $x { $x == $x.flip } ) .grep( -> $y { $y.base(2) == $y.base(2).flip } ) .sum.say </code></pre> Will allocate and set up a million element array and iterate it, while: <pre class="prettyprint"><code>(1..1_000_000) .grep( * !%% 2 ) .grep( -> $x { $x == $x.flip } ) .grep( -> $y { $y.base(2) == $y.base(2).flip } ) .sum.say </code></pre> Runs rather faster, because it need not do that. Further, the <code>...</code> operator is currently far slower than the <code>..</code> operator. It's not doomed to be that way forever, it's just received a lot less attention so far. Since <code>.grep</code> has also been decently well optimized, it turns out to be quicker to filter out the elements made by the range - for now, anyway. Finally, using <code>==</code> to compare the (string) results of <code>base</code> and <code>flip</code> is not so efficient, since it parses them back into integers, when we could use <code>eq</code> and compare the strings: <pre class="prettyprint"><code>(1 .. 1_000_000) .grep(* !%% 2) .grep( -> $x { $x eq $x.flip } ) .grep( -> $y { $y.base(2) eq $y.base(2).flip } ) .sum.say </code></pre>

If you want something that is faster, you can write your own sequence generator. <pre class="prettyprint lang-raku prettyprint-override"><code>gather { loop (my int $i = 1; $i < 1_000_000; $i += 2) { take $i } } .grep( -> $x { $x eq $x.flip } ) .grep( -> $y { $y.base(2) eq $y.base(2).flip } ) .sum.say </code></pre> Which takes about 4 seconds. <hr> Or to go even faster, you can create the Iterator object yourself. <pre class="prettyprint lang-raku prettyprint-override"><code>class Odd does Iterator { has uint $!count = 1; method pull-one () { if ($!count += 2) < 1_000_000 { $!count } else { IterationEnd } } } Seq.new(Odd.new) .grep( -> $x { $x == $x.flip } ) .grep( -> $y { $y.base(2) == $y.base(2).flip } ) .sum.say </code></pre> Which only takes about 2 seconds. <hr> Of course if you want to go as fast as possible, get rid of the sequence iteration entirely. Also use native <code>int</code>s. Also cache the base 10 string. <code>(my $s = ~$x)</code> <pre class="prettyprint lang-raku prettyprint-override"><code>my int $acc = 0; loop ( my int $x = 1; $x < 1_000_000; $x += 2) { next unless (my $s = ~$x) eq $s.flip; next unless $x.base(2) eq $x.base(2).flip; $acc += $x } say $acc; </code></pre> Which gets it down to about <code>0.45</code> seconds. (Caching the <code>.base(2)</code> didn't seem to do anything.) This is probably close to the minimum without resorting to using <code>nqp</code> ops directly. <hr> I tried writing a native int bit flipper, but it made it slower. <code>0.5</code> seconds. (I did not come up with this algorithm, I only adapted it to Raku. I also added the <code>+> $in.msb</code> to fit this problem.) I would guess that spesh is leaving in operations that don't need to be there. Or maybe it isn't JITting very well. It might be more performant for values larger than <code>1_000_000</code>. (<code>.base(2).flip</code> is <code>O(log n)</code> whereas this is <code>O(1)</code>.) <pre class="prettyprint lang-raku prettyprint-override"><code>sub flip-bits ( int $in --> int ) { my int $n = ((($in +& (my int $ = 0xaaaaaaaa)) +> 1) +| (($in +& (my int $ = 0x55555555)) +< 1)); $n = ((($n +& (my int $ = 0xcccccccc)) +> 2) +| (($n +& (my int $ = 0x33333333)) +< 2)); $n = ((($n +& (my int $ = 0xf0f0f0f0)) +> 4) +| (($n +& (my int $ = 0x0f0f0f0f)) +< 4)); $n = ((($n +& (my int $ = 0xff00ff00)) +> 8) +| (($n +& (my int $ = 0x00ff00ff)) +< 8)); ((($n +> 16) +| ($n+< 16)) +> (32 - 1 - $in.msb)) +& (my int $ = 0xffffffff); } … # next unless (my $s = ~$x) eq $s.flip; next unless $x == flip-bits($x); </code></pre> <hr> You can even try to use multiple threads. Note that this workload is entirely too little for this to be effective. The overhead of using threads swamps out any benefit. <pre class="prettyprint lang-raku prettyprint-override"><code>my atomicint $total = 0; sub process ( int $s, int $e ) { # these are so the block lambda works properly # (works around what I think is a bug) my int $ = $s; my int $ = $e; start { my int $acc = 0; loop ( my int $x = $s; $x < $e; $x += 2) { next unless (my $s = ~$x) eq $s.flip; next unless $x.base(2) eq $x.base(2).flip; $acc += $x; } $total ⚛+= $acc; } } my int $cores = (Kernel.cpu-cores * 2.2).Int; my int $per = 1_000_000 div $cores; ++$per if $per * $cores < 1_000_000; my @promises; my int $start = 1; for ^$cores { my int $end = $start + $per - 2; $end = 1_000_000 if $end > 1_000_000; push @promises, process $start, $end; #say $start, "\t", $end; $start = $end + 2; } await @promises; say $total; </code></pre> Which runs in about <code>0.63</code> seconds. (I messed with the <code>2.2</code> value to find a near minimum time on my computer.)

Why is there such a large performance difference between these two scrips that do the same thing?

Tags:

raku

This is problem36 from the Euler Project. Sum all of the numbers below a million that are palindromic in base 2 and base 10.

I'd originally tried solving it in a more functional style.

This runs in just under 6 seconds.

[1..1_000_000]
    .grep( * !%% 2 )
    .grep( -> $x { $x == $x.flip } )
    .grep( -> $y { $y.base(2) == $y.base(2).flip } )
    .sum.say

Surprisingly this took 12 seconds even though I'm only generating odd numbers and therefore skipping the test for even.

(1,3 ... 1_000_000)
    .grep( -> $x { $x == $x.flip } )
    .grep( -> $y { $y.base(2) == $y.base(2).flip } )
    .sum.say

This runs in about 3 seconds.

my @pals;
for (1,3 ... 1_000_000) -> $x {
    next unless $x == $x.flip;
    next unless $x.base(2) == $x.base(2).flip;
    @pals.push($x);
}

say [+] @pals;

I also noted that there is a significant difference between using

for (1,3 ... 1_000_000) -> $x { ...

and

for [1,3 ... 1_000_000] -> $x { ...

Anyone know why the streaming versions are so much slower than the iterative one? And, why would those two for loops be so different in performance?

541

asked Mar 30 '20 13:03

jmcneirney

Video Answer

2 Answers

The construct [...] is an array composer. It eagerly iterates the iterable found within it, and stores each value into the array. Only then do we proceed to do the iteration. That results in far more memory allocation and is less cache-friendly. By contrast, parentheses do nothing (aside from grouping, but they don't add any semantics beyond that). Thus:

[1..1_000_000]
    .grep( * !%% 2 )
    .grep( -> $x { $x == $x.flip } )
    .grep( -> $y { $y.base(2) == $y.base(2).flip } )
    .sum.say

Will allocate and set up a million element array and iterate it, while:

(1..1_000_000)
    .grep( * !%% 2 )
    .grep( -> $x { $x == $x.flip } )
    .grep( -> $y { $y.base(2) == $y.base(2).flip } )
    .sum.say

Runs rather faster, because it need not do that.

Further, the ... operator is currently far slower than the .. operator. It's not doomed to be that way forever, it's just received a lot less attention so far. Since .grep has also been decently well optimized, it turns out to be quicker to filter out the elements made by the range - for now, anyway.

Finally, using == to compare the (string) results of base and flip is not so efficient, since it parses them back into integers, when we could use eq and compare the strings:

(1 .. 1_000_000)
    .grep(* !%% 2)
    .grep( -> $x { $x eq $x.flip } )
    .grep( -> $y { $y.base(2) eq $y.base(2).flip } )
    .sum.say

190

answered Oct 25 '22 11:10

Jonathan Worthington

If you want something that is faster, you can write your own sequence generator.

gather {
  loop (my int $i = 1; $i < 1_000_000; $i += 2) {
    take $i
  }
}
.grep( -> $x { $x eq $x.flip } )
.grep( -> $y { $y.base(2) eq $y.base(2).flip } )
.sum.say

Which takes about 4 seconds.

Or to go even faster, you can create the Iterator object yourself.

class Odd does Iterator {
    has uint $!count = 1;

    method pull-one () {
        if ($!count += 2) < 1_000_000 {
            $!count
        } else {
            IterationEnd
        }
    }
}

Seq.new(Odd.new)
.grep( -> $x { $x == $x.flip } )
.grep( -> $y { $y.base(2) == $y.base(2).flip } )
.sum.say

Which only takes about 2 seconds.

Of course if you want to go as fast as possible, get rid of the sequence iteration entirely.

Also use native ints.

Also cache the base 10 string. (my $s = ~$x)

my int $acc = 0;
loop ( my int $x = 1; $x < 1_000_000; $x += 2) {
  next unless (my $s = ~$x) eq $s.flip;
  next unless $x.base(2) eq $x.base(2).flip;
  $acc += $x
}
say $acc;

Which gets it down to about 0.45 seconds.

(Caching the .base(2) didn't seem to do anything.)

This is probably close to the minimum without resorting to using nqp ops directly.

I tried writing a native int bit flipper, but it made it slower. 0.5 seconds.
(I did not come up with this algorithm, I only adapted it to Raku. I also added the +> $in.msb to fit this problem.)

I would guess that spesh is leaving in operations that don't need to be there.
Or maybe it isn't JITting very well.

It might be more performant for values larger than 1_000_000.
(.base(2).flip is O(log n) whereas this is O(1).)

sub flip-bits ( int $in --> int ) {
  my int $n =
       ((($in +& (my int $ = 0xaaaaaaaa)) +> 1) +| (($in +& (my int $ = 0x55555555)) +< 1));
  $n = ((($n  +& (my int $ = 0xcccccccc)) +> 2) +| (($n  +& (my int $ = 0x33333333)) +< 2));
  $n = ((($n  +& (my int $ = 0xf0f0f0f0)) +> 4) +| (($n  +& (my int $ = 0x0f0f0f0f)) +< 4));
  $n = ((($n  +& (my int $ = 0xff00ff00)) +> 8) +| (($n  +& (my int $ = 0x00ff00ff)) +< 8));
  ((($n +> 16) +| ($n+< 16)) +> (32 - 1 - $in.msb)) +& (my int $ = 0xffffffff);
}

…

  # next unless (my $s = ~$x) eq $s.flip;
  next unless $x == flip-bits($x);

You can even try to use multiple threads.

Note that this workload is entirely too little for this to be effective.
The overhead of using threads swamps out any benefit.

my atomicint $total = 0;

sub process ( int $s, int $e ) {
  # these are so the block lambda works properly
  # (works around what I think is a bug)
  my int $ = $s;
  my int $ = $e;

  start {
    my int $acc = 0;
    loop ( my int $x = $s; $x < $e; $x += 2) {
      next unless (my $s = ~$x) eq $s.flip;
      next unless $x.base(2) eq $x.base(2).flip;
      $acc += $x;
    }
    $total ⚛+= $acc;
  }
}


my int $cores = (Kernel.cpu-cores * 2.2).Int;

my int $per = 1_000_000 div $cores;
++$per if $per * $cores < 1_000_000;

my @promises;

my int $start = 1;
for ^$cores {
  my int $end = $start + $per - 2;
  $end = 1_000_000 if $end > 1_000_000;

  push @promises, process $start, $end;

#say $start, "\t", $end;

  $start = $end + 2;
}

await @promises;
say $total;

Which runs in about 0.63 seconds.
(I messed with the 2.2 value to find a near minimum time on my computer.)

answered Oct 25 '22 11:10

Brad Gilbert

Related questions
                            
                                Why do **2 and ² behave differently when using the cross meta-operator?
                            
                                Unexpected FAIL with :exists in raku
                            
                                total method and the sigil of a Bag variable in Perl 6
                            
                                How to insert a space between Chinese character and English character?
                            
                                Why does perl6 multi default to sub?
                            
                                Map signature mismatch with Whatever? x vs X vs xx
                            
                                Find path to the executing Raku script
                            
                                Can I introspect a Regex's interpolated value?
                            
                                "perldoc -f" for Perl6/Rakudo
                            
                                Why is assignment to a list of variables inconsistent?
                            
                                How to return the values in a junction as an array?
                            
                                How can I convert a Str to an Int only when it represents an integer?
                            
                                Converting pack to perl6
                            
                                Cloning multidimensional arrays
                            
                                How do I specify a Perl 6 signature that encompasses everything I accept and excludes everything else?
                            
                                "Cannot assign to immutable value" when trying to assign to a string + role
                            
                                Communication between objects
                            
                                perl6 min and max of mixed Str and Int arguments
                            
                                Elegant way to find repeated digits in Raku (née Perl 6)
                            
                                Reduction meta-operator inconsistency

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With