On the same computer, using the Rakudo compiler "rakudo-moar-2021.06-01-macos-x86_64-clang.tar" I get 3-10 times speed-ups compared to the timings of my calculations in the original post.
.elems: 100000
.head(3): (id.20444 => 81.95246687507492 id.81745 => 34.859323339828464 id.79973 => 97.33816856420829)
time of .sort({-$_.value}) : 0.764283216
time of .sort(-*.value) : 0.618963783
time of .sort.reverse : 0.584477656
time of .values.sort : 1.68912663
Note, that those timings are close to the R timings. So, on those kinds of computations Raku has the performance I expect.
I recently watched a FOSDEM presentation by Elizabeth Mattijsen titled
"Raku - Sets without Borders"
and decided to adopt Raku Mix
objects in some of my computational workflows.
I noticed that sorting (the pairs of) a Mix
object is very slow -- I would say 100 to 1000 times slower than what I expect. See the Raku code and output below. (I also provided related R code and output on the same computer.)
Is that slowness expected? Is there a work around for faster computations?
(To be more specific, I am interested in fast reverse-sort and fast retrieval of the top-K largest elements in a Mix
.)
(The timings are on a few years old Mac Book Pro, Mac OS 10.15.7, using latest Rakudo Compiler "rakudo-moar-2021.02.1-01-macos-x86_64-clang.tar.gz" .)
#!/usr/bin/env perl6
my @words = Array(1 .. 100_000).map({ 'id.' ~ $_.Str });
my $m0 = Mix(@words.map({ $_ => 100.rand() }));
say '.elems: ', $m0.elems;
say '.head(3): ', $m0.head(3);
my $start = now;
my $m1 = $m0.sort({-$_.value});
say 'time of .sort({-$_.value}): ', now - $start;
$start = now;
my $m2 = $m0.sort(-*.value);
say 'time of .sort(-*.value) : ', now - $start;
$start = now;
my $m3 = $m0.sort.reverse;
say 'time of .sort.reverse : ', now - $start;
$start = now;
my $m4 = $m0.values.sort;
say 'time of .values.sort : ', now - $start;
# .elems: 100000
# .head(3): (id.96239 => 87.89629474533156 id.89110 => 11.661698290245525 id.24795 => # 64.80528155838671)
# time of .sort({-$_.value}): 3.64936396
# time of .sort(-*.value) : 4.0388654
# time of .sort.reverse : 4.5783556
# time of .values.sort : 4.3461059
Here is a similar data and sorting code in R:
words <- paste0( 'id.', 1:100000)
m <- setNames( runif(n = length(words), min = 0, max = 100), words)
cat( "length(m) : ", length(m), "\n")
cat( "m[1:3]:\n"); print(m[1:3]); cat("\n")
cat( "system.time( sort(m) ) : ", system.time( sort(m) ), "\n")
cat( "system.time( m[order(-m)] ) : ", system.time( m[order(-m)] ), "\n")
cat( "system.time( rev(sort(names(m))) ) : ", system.time( rev(sort(names(m))) ), "\n")
# length(m) : 100000
# m[1:3]:
# id.1 id.2 id.3
# 89.99714 54.31701 11.57415
#
# system.time( sort(m) ) : 0.011 0 0.011 0 0
# system.time( m[order(-m)] ) : 0.011 0 0.011 0 0
# system.time( rev(sort(names(m))) ) : 0.298 0.001 0.3 0 0
Here are answers to questions by @raith:
"Is the m in the R code mutable?"
No, R objects are mostly immutable.
"Does the sort(m) build a new data structure, or just a new index into the existing m structure?"
A new data structure is created. R is a descendent of LISP, so it mostly follows, in spirit, the functional programming paradigm.
"Same question for m[order(-m)]?"order(-m)
gives an integer vector (of indexes.) That vector of indexes is used to retrieve elements of m
into a new object.
"And rev(sort(names(m)))?"names(m)
takes the "keys" of m
's elements. Those keys are sorted and placed into a character vector, and then that character vector is reversed. (I.e. a new object is created.)
"Presumably building just an index could be dramatically faster. Perhaps Raku could have a sort variant for Tuples that produces a Seq that relies
on that approach?"
I assume it is not my place to comment on this, but I would like to mention that:
A few people requested more detailed R benchmarking:
library(microbenchmark)
set.seed(32)
words <- paste0( 'id.', 1:100000)
m <- setNames( runif(n = length(words), min = 0, max = 100), words)
cat( "length(m): ", length(m), "\n")
cat( "m[1:3]:\n"); print(m[1:3]); cat("\n")
microbenchmark::microbenchmark(
sort(m, decreasing = T),
sort(-m),
m[order(-m)],
rev(sort(m)),
rev(sort(names(m))),
unit = "s", times = 100
)
# length(m) : 100000
#
# m[1:3]:
# id.1 id.2 id.3
# 50.58405 59.48084 80.87471
#
# Unit: seconds
# expr min lq mean median uq max neval cld
# sort(m, decreasing = T) 0.006246853 0.007789205 0.009215613 0.008263348 0.009002414 0.02450786 100 a
# sort(-m) 0.006857755 0.008376058 0.010292145 0.008939605 0.010069702 0.02469324 100 a
# m[order(-m)] 0.006658089 0.008257555 0.009726704 0.008718414 0.009811200 0.02294023 100 a
# rev(sort(m)) 0.008975013 0.010939122 0.014965756 0.011692480 0.012571979 0.22022085 100 b
# rev(sort(names(m))) 0.256036106 0.268526455 0.278385866 0.277794917 0.288586351 0.31160492 100 c
#
There was a deficiency in the way comparisons were made with .sort
, which resulted in a lot of calls to the Mu.Bool
method, which in some cases would make up about 50% of CPU used. Today I have changed the handling of .sort
for non-standard comparisons in such a way that the call to Mu.Bool
is no longer made. This resulted in the code of the question to run about 2x as fast as before, although @codesections reported a 4x speed increase. This should be in the 2020.03 release.
[EDIT 2021-03-06: thanks to a series of commits over the past ~day (thanks, Liz!), this slowdown is now largely fixed on HEAD
; these performance gains should show up in the next monthly release. I'm leaving the answer below as an example of how to dig into this sort of issue, but the specific problems it diagnosed have largely been resolved.]
Building on @Elizabeth Mattijsen's comment: The slow performance here is mostly due to the Rakudo compiler not properly optimizing the generated code (as of 2021-03-05). As the compiler continues to improve, the (idiomatic) code you wrote above should perform much better.
However, for today we can use a few workarounds to speed up this calculation. While it's still true that Raku's performance won't be particularly competitive with R here, some profiling-driven refactoring can make this code nearly an order of magnitude faster.
Here's how we do it:
First, we start by profiling the code. If you run your script with raku --profile=<filename>
, then you'll get a profile written to <filename>
. By default, this will be an HTML file that allows you to view the profile in your browser. My preference, however, is to specify an .sql
extension, which generates an SQL profile. I then view this profile with MoarProf, the revised profiler that Timo Paulssen is building.
Looking at this profile shows exactly the issue that Liz mentioned: Calls that should be getting inlined are not. To fix this, let's create our own sorting function, which the JIT compiler will happily optimize:
sub my-reverse($a, $b) {
$a.value > $b.value ?? Order::Less !! Order::More
}
Using this function (with $m0.sort(&my-reverse)
) immediately shaves a good 25% off the runtime, but it's still way too high. Back to the profiler!
The next thing that jumps out at me is that we still have far too many calls to Bool
. In particular, it looks like Rakudo is currently converting Ordering
to a Bool
. I think this is a bug and plan to look into it after posting this, but in any event, we can save Rakudo the effort:
sub my-reverse1($a, $b) {
$a.value > $b.value ?? False !! True
}
On my machine, this cuts execution time in half again, getting us to ~28% of the original runtime of of .sort({-$_.value})
. This is getting decent, and would be a fine place to stop.
Let's pres on, though: checking the profiler again shows that we're still devoting a very large chunk of our time to calling Bool
(even though we're calling in half as often). To fix this at the moment, we'll need to drop down to NQP to compare numbers without constructing a Bool
:
sub nqp-reverse($a, $b) {
use nqp;
nqp::isge_n($a.value, $b.value) ?? False !! True
}
This cuts our execution time in half again, and gets us to about the performance that I'd want out of Raku.
Here are the timing results I get, both for the functions I added and the ones in your question, reported in the same format you used:
.elems: 100000
.head(3): (id.195 => 80.81458886131459 id.31126 => 84.25690944480021 id.60237 => 45.63311676798485)
time of .sort(&nqp-reverse): 0.3226533
time of .sort(&my-reverse1): 0.76803384
time of .sort(&my-reverse) : 1.4643238
time of .sort({-$_.value}) : 2.6780952
time of .sort(-*.value) : 1.8549689
time of .sort.reverse : 2.5862973
time of .values.sort : 2.078715
On a slight aside, there are some Raku idioms that can streamline a bit...
my @words = Array(1 .. 100_000).map({ 'id.' ~ $_.Str });
my $m0 = Mix(@words.map({ $_ => 100.rand() }));
to
my $m0 = .map(* => 100.rand).Mix given 'id.' X~ 1..100_000;
this employs the 'given' formulation and the X metaoperator with '~' string concatenation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With