Raku slow Mixes sorting

Question

Update

On the same computer, using the Rakudo compiler "rakudo-moar-2021.06-01-macos-x86_64-clang.tar" I get 3-10 times speed-ups compared to the timings of my calculations in the original post.

.elems: 100000
.head(3): (id.20444 => 81.95246687507492 id.81745 => 34.859323339828464 id.79973 => 97.33816856420829)
time of .sort({-$_.value}) : 0.764283216
time of .sort(-*.value)    : 0.618963783
time of .sort.reverse      : 0.584477656
time of .values.sort       : 1.68912663

Note, that those timings are close to the R timings. So, on those kinds of computations Raku has the performance I expect.

Original post

I recently watched a FOSDEM presentation by Elizabeth Mattijsen titled "Raku - Sets without Borders" and decided to adopt Raku Mix objects in some of my computational workflows.

I noticed that sorting (the pairs of) a Mix object is very slow -- I would say 100 to 1000 times slower than what I expect. See the Raku code and output below. (I also provided related R code and output on the same computer.)

Is that slowness expected? Is there a work around for faster computations?

(To be more specific, I am interested in fast reverse-sort and fast retrieval of the top-K largest elements in a Mix.)

(The timings are on a few years old Mac Book Pro, Mac OS 10.15.7, using latest Rakudo Compiler "rakudo-moar-2021.02.1-01-macos-x86_64-clang.tar.gz" .)

Raku

#!/usr/bin/env perl6

my @words = Array(1 .. 100_000).map({ 'id.' ~ $_.Str });
my $m0 = Mix(@words.map({ $_ => 100.rand() }));
say '.elems: ', $m0.elems;
say '.head(3): ', $m0.head(3);

my $start = now;
my $m1 = $m0.sort({-$_.value});
say 'time of .sort({-$_.value}): ', now - $start;

$start = now;
my $m2 = $m0.sort(-*.value);
say 'time of .sort(-*.value)   : ', now - $start;

$start = now;
my $m3 = $m0.sort.reverse;
say 'time of .sort.reverse     : ', now - $start;

$start = now;
my $m4 = $m0.values.sort;
say 'time of .values.sort      : ', now - $start;

# .elems: 100000
# .head(3): (id.96239 => 87.89629474533156 id.89110 => 11.661698290245525 id.24795 => # 64.80528155838671)
# time of .sort({-$_.value}): 3.64936396
# time of .sort(-*.value)   : 4.0388654
# time of .sort.reverse     : 4.5783556
# time of .values.sort      : 4.3461059

R

Here is a similar data and sorting code in R:

words <- paste0( 'id.', 1:100000)
m <- setNames( runif(n = length(words), min = 0, max = 100), words)
cat( "length(m)                          : ", length(m), "
")
cat( "m[1:3]:
"); print(m[1:3]); cat("
")
cat( "system.time( sort(m) )             : ", system.time( sort(m) ), "
")
cat( "system.time( m[order(-m)] )        : ", system.time( m[order(-m)] ), "
")
cat( "system.time( rev(sort(names(m))) ) : ", system.time( rev(sort(names(m))) ), "
")

# length(m)                          :  100000 
# m[1:3]:
#     id.1     id.2     id.3 
# 89.99714 54.31701 11.57415 
#
# system.time( sort(m) )             :  0.011 0     0.011 0 0 
# system.time( m[order(-m)] )        :  0.011 0     0.011 0 0 
# system.time( rev(sort(names(m))) ) :  0.298 0.001 0.3   0 0

Here are answers to questions by @raith:

"Is the m in the R code mutable?"
No, R objects are mostly immutable.
"Does the sort(m) build a new data structure, or just a new index into the existing m structure?"
A new data structure is created. R is a descendent of LISP, so it mostly follows, in spirit, the functional programming paradigm.
"Same question for m[order(-m)]?"
order(-m) gives an integer vector (of indexes.) That vector of indexes is used to retrieve elements of m into a new object.
"And rev(sort(names(m)))?"
names(m) takes the "keys" of m's elements. Those keys are sorted and placed into a character vector, and then that character vector is reversed. (I.e. a new object is created.)
"Presumably building just an index could be dramatically faster. Perhaps Raku could have a sort variant for Tuples that produces a Seq that relies on that approach?"
I assume it is not my place to comment on this, but I would like to mention that:
- Julia -- which is also a LISP descendant -- does something like that for its data structures. And Julia's creators and developers claim to be, in general, (much) faster than R and Mathematica. (Which are other LISP descendants for mathematical computations.)
- I prefer and expect Raku code in the functional paradigm style to be fast.

Updated R benchmark

A few people requested more detailed R benchmarking:

library(microbenchmark)
set.seed(32)
words <- paste0( 'id.', 1:100000)
m <- setNames( runif(n = length(words), min = 0, max = 100), words)
cat( "length(m): ", length(m), "
")
cat( "m[1:3]:
"); print(m[1:3]); cat("
")

microbenchmark::microbenchmark(
  sort(m, decreasing = T),
  sort(-m),
  m[order(-m)],
  rev(sort(m)),
  rev(sort(names(m))),
  unit = "s", times = 100
)

# length(m) :  100000
#
# m[1:3]:
#     id.1     id.2     id.3
# 50.58405 59.48084 80.87471
#
# Unit: seconds
#                     expr         min          lq        mean      median          uq        max neval cld
#  sort(m, decreasing = T) 0.006246853 0.007789205 0.009215613 0.008263348 0.009002414 0.02450786   100 a
#                 sort(-m) 0.006857755 0.008376058 0.010292145 0.008939605 0.010069702 0.02469324   100 a
#             m[order(-m)] 0.006658089 0.008257555 0.009726704 0.008718414 0.009811200 0.02294023   100 a
#             rev(sort(m)) 0.008975013 0.010939122 0.014965756 0.011692480 0.012571979 0.22022085   100  b
#      rev(sort(names(m))) 0.256036106 0.268526455 0.278385866 0.277794917 0.288586351 0.31160492   100   c
#

Elizabeth Mattijsen · Accepted Answer

There was a deficiency in the way comparisons were made with .sort, which resulted in a lot of calls to the Mu.Bool method, which in some cases would make up about 50% of CPU used. Today I have changed the handling of .sort for non-standard comparisons in such a way that the call to Mu.Bool is no longer made. This resulted in the code of the question to run about 2x as fast as before, although @codesections reported a 4x speed increase. This should be in the 2020.03 release.

codesections · Answer

[EDIT 2021-03-06: thanks to a series of commits over the past ~day (thanks, Liz!), this slowdown is now largely fixed on HEAD; these performance gains should show up in the next monthly release. I'm leaving the answer below as an example of how to dig into this sort of issue, but the specific problems it diagnosed have largely been resolved.]

Building on @Elizabeth Mattijsen's comment: The slow performance here is mostly due to the Rakudo compiler not properly optimizing the generated code (as of 2021-03-05). As the compiler continues to improve, the (idiomatic) code you wrote above should perform much better.

However, for today we can use a few workarounds to speed up this calculation. While it's still true that Raku's performance won't be particularly competitive with R here, some profiling-driven refactoring can make this code nearly an order of magnitude faster.

Here's how we do it:

First, we start by profiling the code. If you run your script with raku --profile=<filename>, then you'll get a profile written to <filename>. By default, this will be an HTML file that allows you to view the profile in your browser. My preference, however, is to specify an .sql extension, which generates an SQL profile. I then view this profile with MoarProf, the revised profiler that Timo Paulssen is building.

Looking at this profile shows exactly the issue that Liz mentioned: Calls that should be getting inlined are not. To fix this, let's create our own sorting function, which the JIT compiler will happily optimize:

sub my-reverse($a, $b) {
    $a.value > $b.value ?? Order::Less !! Order::More
}

Using this function (with $m0.sort(&my-reverse)) immediately shaves a good 25% off the runtime, but it's still way too high. Back to the profiler!

The next thing that jumps out at me is that we still have far too many calls to Bool. In particular, it looks like Rakudo is currently converting Ordering to a Bool. I think this is a bug and plan to look into it after posting this, but in any event, we can save Rakudo the effort:

sub my-reverse1($a, $b) {
    $a.value > $b.value ?? False !! True
}

On my machine, this cuts execution time in half again, getting us to ~28% of the original runtime of of .sort({-$_.value}). This is getting decent, and would be a fine place to stop.

Let's pres on, though: checking the profiler again shows that we're still devoting a very large chunk of our time to calling Bool (even though we're calling in half as often). To fix this at the moment, we'll need to drop down to NQP to compare numbers without constructing a Bool:

sub nqp-reverse($a, $b) {
    use nqp;
    nqp::isge_n($a.value, $b.value) ?? False !! True
}

This cuts our execution time in half again, and gets us to about the performance that I'd want out of Raku.

Here are the timing results I get, both for the functions I added and the ones in your question, reported in the same format you used:

.elems: 100000
.head(3): (id.195 => 80.81458886131459 id.31126 => 84.25690944480021 id.60237 => 45.63311676798485)
time of .sort(&nqp-reverse): 0.3226533
time of .sort(&my-reverse1): 0.76803384
time of .sort(&my-reverse) : 1.4643238
time of .sort({-$_.value}) : 2.6780952
time of .sort(-*.value)    : 1.8549689
time of .sort.reverse      : 2.5862973
time of .values.sort       : 2.078715

p6steve · Answer

On a slight aside, there are some Raku idioms that can streamline a bit...

my @words = Array(1 .. 100_000).map({ 'id.' ~ $_.Str });
my $m0 = Mix(@words.map({ $_ => 100.rand() }));

to

my $m0 = .map(* => 100.rand).Mix given 'id.' X~ 1..100_000;

this employs the 'given' formulation and the X metaoperator with '~' string concatenation.

Raku slow Mixes sorting

Tags:

performance

sorting

raku

Update

Original post

Raku

R

Updated R benchmark

Anton Antonov

3 Answers

Elizabeth Mattijsen

codesections

p6steve

Recent Activity

Donate For Us

Raku slow Mixes sorting

Tags:

performance

sorting

raku

Update

Original post

Raku

R

Updated R benchmark

Anton Antonov

3 Answers

Elizabeth Mattijsen

codesections

p6steve

Related questions

Recent Activity

Donate For Us