I have a SnowFlake script for Python, and I convert it to a Raku module, and call it 10,000,000 times, and it is very slow (file test.raku):
use IdWorker;
my $worker = IdWorker.new(worker_id => 10, sequence => 0);
my @ids = gather for (1...10000000) { take $worker.get_id() };
my $duration = now - INIT now;
say sprintf("%-8s %-8s %-20s", @ids.elems, Set(@ids).elems, $duration);
As @codesections's answer says, it's now
that takes so much time.
Python takes about 12 seconds, while Raku takes minutes. How can I fix this?
This empty for loop takes about 0.12 seconds:
for (1...10000000) {
;
}
And the call get_id()
on $worker
takes minutes:
for (1...10000000) {
$worker.get_id();
}
Growing variables in a loop takes very long. Each time you increase the length of the variable, a million times here, you force MATLAB to first create a variable with the initial length+1, then copy the contents, then delete the old variable. That's probably what is taking your code so long.
If you only want the largest element, you don't need to keep the whole array in memory. Alternatively, if you do want the whole array, you could write the values to a file, then read the file back in chunks. Show activity on this post. The range usually returns an list of values which can be iterated.
Loops are slower in R than in C++ because R is an interpreted language (not compiled), even if now there is just-in-time (JIT) compilation in R (>= 3.4) that makes R loops faster (yet, still not as fast). Then, R loops are not that bad if you don't use too many iterations (let's say not more than 100,000 iterations).
I believe that the issue here does not come from constructing the array but rather from now
itself – which seems to be oddly slow.
For example, this code:
no worries; # skip printing warning for useless `now`
for ^10_000_000 { now }
say now - INIT now;
also takes minutes to run. This strikes me as a bug, and I'll open an issue [Edit: I located rakudo/rakudo#3620 on this issue. The good news is that there's already a plan for a fix.] Since your code calls now
multiple times in each iteration, this issue impacts your loop even more.
Apart from that, there are a few other areas where you could speed this code up:
First, using an implicit return (that is, changing return new_id;
to just new_id
, and making similar changes for the other places where you use return
) is generally slightly faster/lets the JIT optimize a bit better.
Second, the line
my @ids = gather for (1...10000000) { take $worker.get_id() };
is somewhat wastefully using gather
/take
(which adds support for lazy lists and is just a more complex construct). You can simplify this into
my @ids = (1...10000000).map: { $worker.get_id() };
(This still constructs an intermediate Seq
, though.)
Third – and this one is more major from a performance impact, though literally as small as it's possible to be from a code change perspective – is to change the (1...10000000)
into (1..10000000)
. The difference is that ...
is the sequence operator while ..
is the range operator. Sequences have some supper powers compared to Ranges (see the docs if you're curious), but are significantly slower to iterate over in a loop like this.
Again, though, these are minor issues; I believe the performance of now
is the largest problem.
The long-term solution for now
being slow is for it to be fixed (we're working on it!) As a temporary workaround, though, if you don't mind dipping into a slightly lower level than is generally advisable for user code, you can use nqp::time_n
to get a floating point number of seconds for the current time. Using this would make your get_timestamp
method look like:
method get_timestamp() {
use nqp;
(nqp::time_n() * 1000).Int;
}
With this workaround and the other refactorings I suggested above, your code now executes in around 55 seconds on my machine – still not nearly as fast as I'd like Raku to be, but well over an order of magnitude better than where we started.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With