Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`now` becomes slow in a 10 million iterations loop

I have a SnowFlake script for Python, and I convert it to a Raku module, and call it 10,000,000 times, and it is very slow (file test.raku):

use IdWorker;

my $worker = IdWorker.new(worker_id => 10, sequence => 0);
my @ids = gather for (1...10000000) { take $worker.get_id() };

my $duration = now - INIT now;
say sprintf("%-8s %-8s %-20s", @ids.elems, Set(@ids).elems, $duration);

As @codesections's answer says, it's now that takes so much time.

Python takes about 12 seconds, while Raku takes minutes. How can I fix this?

This empty for loop takes about 0.12 seconds:

for (1...10000000) {
    ;
}

And the call get_id() on $worker takes minutes:

for (1...10000000) {
    $worker.get_id();
}
like image 260
chenyf Avatar asked Mar 10 '21 13:03

chenyf


People also ask

Why is my for loop taking so long?

Growing variables in a loop takes very long. Each time you increase the length of the variable, a million times here, you force MATLAB to first create a variable with the initial length+1, then copy the contents, then delete the old variable. That's probably what is taking your code so long.

How do you handle a large loop in Python?

If you only want the largest element, you don't need to keep the whole array in memory. Alternatively, if you do want the whole array, you could write the values to a file, then read the file back in chunks. Show activity on this post. The range usually returns an list of values which can be iterated.

Why are loops so slow in R?

Loops are slower in R than in C++ because R is an interpreted language (not compiled), even if now there is just-in-time (JIT) compilation in R (>= 3.4) that makes R loops faster (yet, still not as fast). Then, R loops are not that bad if you don't use too many iterations (let's say not more than 100,000 iterations).


Video Answer


1 Answers

I believe that the issue here does not come from constructing the array but rather from now itself – which seems to be oddly slow.

For example, this code:

no worries; # skip printing warning for useless `now`
for ^10_000_000 { now }
say now - INIT now;

also takes minutes to run. This strikes me as a bug, and I'll open an issue [Edit: I located rakudo/rakudo#3620 on this issue. The good news is that there's already a plan for a fix.] Since your code calls now multiple times in each iteration, this issue impacts your loop even more.

Apart from that, there are a few other areas where you could speed this code up:

First, using an implicit return (that is, changing return new_id; to just new_id, and making similar changes for the other places where you use return) is generally slightly faster/lets the JIT optimize a bit better.

Second, the line

my @ids = gather for (1...10000000) { take $worker.get_id() };

is somewhat wastefully using gather/take (which adds support for lazy lists and is just a more complex construct). You can simplify this into

my @ids = (1...10000000).map: { $worker.get_id() };

(This still constructs an intermediate Seq, though.)

Third – and this one is more major from a performance impact, though literally as small as it's possible to be from a code change perspective – is to change the (1...10000000) into (1..10000000). The difference is that ... is the sequence operator while .. is the range operator. Sequences have some supper powers compared to Ranges (see the docs if you're curious), but are significantly slower to iterate over in a loop like this.

Again, though, these are minor issues; I believe the performance of now is the largest problem.

The long-term solution for now being slow is for it to be fixed (we're working on it!) As a temporary workaround, though, if you don't mind dipping into a slightly lower level than is generally advisable for user code, you can use nqp::time_n to get a floating point number of seconds for the current time. Using this would make your get_timestamp method look like:

method get_timestamp() {
    use nqp;
    (nqp::time_n() * 1000).Int;
}

With this workaround and the other refactorings I suggested above, your code now executes in around 55 seconds on my machine – still not nearly as fast as I'd like Raku to be, but well over an order of magnitude better than where we started.

like image 51
codesections Avatar answered Oct 23 '22 01:10

codesections