Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is there such a large performance difference between these two scrips that do the same thing?

Tags:

raku

This is problem36 from the Euler Project. Sum all of the numbers below a million that are palindromic in base 2 and base 10.

I'd originally tried solving it in a more functional style.

This runs in just under 6 seconds.

[1..1_000_000]
    .grep( * !%% 2 )
    .grep( -> $x { $x == $x.flip } )
    .grep( -> $y { $y.base(2) == $y.base(2).flip } )
    .sum.say

Surprisingly this took 12 seconds even though I'm only generating odd numbers and therefore skipping the test for even.

(1,3 ... 1_000_000)
    .grep( -> $x { $x == $x.flip } )
    .grep( -> $y { $y.base(2) == $y.base(2).flip } )
    .sum.say

This runs in about 3 seconds.

my @pals;
for (1,3 ... 1_000_000) -> $x {
    next unless $x == $x.flip;
    next unless $x.base(2) == $x.base(2).flip;
    @pals.push($x);
}

say [+] @pals;

I also noted that there is a significant difference between using

for (1,3 ... 1_000_000) -> $x { ...

and

for [1,3 ... 1_000_000] -> $x { ...

Anyone know why the streaming versions are so much slower than the iterative one? And, why would those two for loops be so different in performance?

like image 541
jmcneirney Avatar asked Mar 30 '20 13:03

jmcneirney


People also ask

What is the difference between machine learning and classical statistical models?

“The major difference between machine learning and statistics is their purpose. Machine learning models are designed to make the most accurate predictions possible. Statistical models are designed for inference about the relationships between variables.”

What is the difference between machine learning and statistical learning?

Statistical Learning is math intensive which is based on the coefficient estimator and requires a good understanding of your data. On the other hand, Machine Learning identifies patterns from your dataset through the iterations which require a way less of human effort.

Which option tells Nmap to perform an faster scan?

You can scan just the most popular 100 ports with the -F (fast scan) option, specify an arbitrary number of the most commonly open ports with --top-ports , or provide a custom list of ports to -p . Skip advanced scan types ( -sC , -sV , -O , --traceroute , and -A ).

What argument would be used in an Nmap command to allow for faster execution?

The only nmap arguments used in this example are -A, to enable OS and version detection, script scanning, and traceroute; -T4 for faster execution; and then the two target hostnames.

Why do people split things up into different scripts?

Splitting things up into many different scripts is mainly for organization purposes. Most people will split them up into different categories, like movement and camera. Splitting them up also makes it easier to find things and is just overall more convenient and easier to manage with big teams.

Is it better to have multiple scripts in one file?

If you have everything in one file then only one person can be editing it. Performance wise, it makes little to no difference. If you had a class in one script, and one in another, then just copy pasted them together, nothing really changes. The variables used (stuff that is stored in memory) stay the same, so you won't be at much loss.

Is too much stuff on one script bad for performance?

So, what I learned, which it is pretty obvious, is that everything has to come in a specific order to work properly. If you have so much stuff on a single script that you continuously get errors or performance problems, then this may be your problem. How does this tie in?

Is it worth it to use a single script?

Unless you have tons and tons of objects calling their own scripts (like many thousands) then it won't make really any meaningful impact. But if you do have a high amount of objects it seems it could potentially pay off to use a single script to manage the many. AB498, niklausmarcu and JimmyTheBeamer like this.


Video Answer


2 Answers

The construct [...] is an array composer. It eagerly iterates the iterable found within it, and stores each value into the array. Only then do we proceed to do the iteration. That results in far more memory allocation and is less cache-friendly. By contrast, parentheses do nothing (aside from grouping, but they don't add any semantics beyond that). Thus:

[1..1_000_000]
    .grep( * !%% 2 )
    .grep( -> $x { $x == $x.flip } )
    .grep( -> $y { $y.base(2) == $y.base(2).flip } )
    .sum.say

Will allocate and set up a million element array and iterate it, while:

(1..1_000_000)
    .grep( * !%% 2 )
    .grep( -> $x { $x == $x.flip } )
    .grep( -> $y { $y.base(2) == $y.base(2).flip } )
    .sum.say

Runs rather faster, because it need not do that.

Further, the ... operator is currently far slower than the .. operator. It's not doomed to be that way forever, it's just received a lot less attention so far. Since .grep has also been decently well optimized, it turns out to be quicker to filter out the elements made by the range - for now, anyway.

Finally, using == to compare the (string) results of base and flip is not so efficient, since it parses them back into integers, when we could use eq and compare the strings:

(1 .. 1_000_000)
    .grep(* !%% 2)
    .grep( -> $x { $x eq $x.flip } )
    .grep( -> $y { $y.base(2) eq $y.base(2).flip } )
    .sum.say
like image 190
Jonathan Worthington Avatar answered Oct 25 '22 11:10

Jonathan Worthington


If you want something that is faster, you can write your own sequence generator.

gather {
  loop (my int $i = 1; $i < 1_000_000; $i += 2) {
    take $i
  }
}
.grep( -> $x { $x eq $x.flip } )
.grep( -> $y { $y.base(2) eq $y.base(2).flip } )
.sum.say

Which takes about 4 seconds.


Or to go even faster, you can create the Iterator object yourself.

class Odd does Iterator {
    has uint $!count = 1;

    method pull-one () {
        if ($!count += 2) < 1_000_000 {
            $!count
        } else {
            IterationEnd
        }
    }
}

Seq.new(Odd.new)
.grep( -> $x { $x == $x.flip } )
.grep( -> $y { $y.base(2) == $y.base(2).flip } )
.sum.say

Which only takes about 2 seconds.


Of course if you want to go as fast as possible, get rid of the sequence iteration entirely.

Also use native ints.

Also cache the base 10 string. (my $s = ~$x)

my int $acc = 0;
loop ( my int $x = 1; $x < 1_000_000; $x += 2) {
  next unless (my $s = ~$x) eq $s.flip;
  next unless $x.base(2) eq $x.base(2).flip;
  $acc += $x
}
say $acc;

Which gets it down to about 0.45 seconds.

(Caching the .base(2) didn't seem to do anything.)

This is probably close to the minimum without resorting to using nqp ops directly.


I tried writing a native int bit flipper, but it made it slower. 0.5 seconds.
(I did not come up with this algorithm, I only adapted it to Raku. I also added the +> $in.msb to fit this problem.)

I would guess that spesh is leaving in operations that don't need to be there.
Or maybe it isn't JITting very well.

It might be more performant for values larger than 1_000_000.
(.base(2).flip is O(log n) whereas this is O(1).)

sub flip-bits ( int $in --> int ) {
  my int $n =
       ((($in +& (my int $ = 0xaaaaaaaa)) +> 1) +| (($in +& (my int $ = 0x55555555)) +< 1));
  $n = ((($n  +& (my int $ = 0xcccccccc)) +> 2) +| (($n  +& (my int $ = 0x33333333)) +< 2));
  $n = ((($n  +& (my int $ = 0xf0f0f0f0)) +> 4) +| (($n  +& (my int $ = 0x0f0f0f0f)) +< 4));
  $n = ((($n  +& (my int $ = 0xff00ff00)) +> 8) +| (($n  +& (my int $ = 0x00ff00ff)) +< 8));
  ((($n +> 16) +| ($n+< 16)) +> (32 - 1 - $in.msb)) +& (my int $ = 0xffffffff);
}

…

  # next unless (my $s = ~$x) eq $s.flip;
  next unless $x == flip-bits($x);

You can even try to use multiple threads.

Note that this workload is entirely too little for this to be effective.
The overhead of using threads swamps out any benefit.

my atomicint $total = 0;

sub process ( int $s, int $e ) {
  # these are so the block lambda works properly
  # (works around what I think is a bug)
  my int $ = $s;
  my int $ = $e;

  start {
    my int $acc = 0;
    loop ( my int $x = $s; $x < $e; $x += 2) {
      next unless (my $s = ~$x) eq $s.flip;
      next unless $x.base(2) eq $x.base(2).flip;
      $acc += $x;
    }
    $total ⚛+= $acc;
  }
}


my int $cores = (Kernel.cpu-cores * 2.2).Int;

my int $per = 1_000_000 div $cores;
++$per if $per * $cores < 1_000_000;

my @promises;

my int $start = 1;
for ^$cores {
  my int $end = $start + $per - 2;
  $end = 1_000_000 if $end > 1_000_000;

  push @promises, process $start, $end;

#say $start, "\t", $end;

  $start = $end + 2;
}

await @promises;
say $total;

Which runs in about 0.63 seconds.
(I messed with the 2.2 value to find a near minimum time on my computer.)

like image 32
Brad Gilbert Avatar answered Oct 25 '22 11:10

Brad Gilbert