Why does Raku perform so bad with multidimensional arrays?

Tags:

I am curious why Raku performs so bad manipulating multidimensional arrays. I have made a quick test initializing a 2 dimension matrix in Python, C# and Raku and the elapsed time is surprisingly high for the later.

For Raku

my @grid[4000;4000] = [[0 xx 4000] xx 4000];
# Elapsed time 42 seconds !!

For Python

table= [ [ 0 for i in range(4000) ] for j in range(4000) ]
# Elapsed time 0.51 seconds

int [,]matrix = new int[4000,4000];
//Just for mimic same behaviour
for(int i=0;i<4000;i++)
   for(int j=0;j<4000;j++)
       matrix[i,j] = 0;
# Elapsed time 0.096 seconds

Am I doing wrong? It seems way too much difference.

765

asked Dec 04 '19 13:12

Javi

1 Answers

An initial direct comparison

I'll begin with code that is much more closely aligned with your Python code than your own translation. I think the Raku code that's most directly equivalent to your Python is:

my \table = [ [ 0 for ^4000 ] for ^4000 ];
say table[3999;3999]; # 0

This code declares a sigil-free identifier¹. It:

Drops "shaping" (the [4000;4000] in my @table[4000;4000]). I dropped it because your Python code isn't doing it. Shaping confers advantages but has performance implications.²
Uses binding instead of assignment. I switched to binding because your Python code is doing binding, not assignment. (Python doesn't distinguish between the two.) While Raku's assignment approach brings fundamental advantages that are worth having for general code, it has performance implications.³

This code I've begun my answer with is still slow.

First, the Raku code, run via a Rakudo compiler from December 2018, is about 5 times slower than your Python code, using a Python interpreter from June 2019, on the same hardware.³

Second, both the Raku code and the Python code are slow, eg compared to your C# code. We can do better...

An idiomatic alternative that's a thousand times faster

The following code is worth considering:

my \table = [ [ 0 xx Inf ] xx Inf ];
say table[ 100_000; 100_000 ]; # 0

Despite this code corresponding to a notional 10 billion element array rather than the mere 16 million element one in your Python and C# code, the wallclock time for running it is less than half that of the Python code, and just 5 times slower than the C# code. That suggests Rakudo is running the Raku code over a thousand times as fast as the equivalent Python code, and a hundred times as fast as the C# code.

The Raku code appears to be so much faster because the table is being initialized lazily by using xx Inf.⁴ The only significant work is done on running the say line. This causes creation of 100,000 first dimension arrays, and then populating just the 100,000th second dimension array with 100,000 elements, so that the say can display the 0 held in that array's last element.

There's More Than One Way To Do It

One issue underlying your question is that there's always more than one way to do it.⁵ If you encounter poor performance for code where speed is critical, coding it differently, as I have done, may make a dramatic difference.⁶

(Another really good option is to ask an SO question...)

The future

Raku is carefully designed to be highly optimizable, i.e. able to one day run much faster given sufficient compiler work over coming years, than, say, Perl 5 or Python 3 can, in theory, ever run, unless they go through a ground-up redesign and years of corresponding compiler work.

A somewhat OK analogy is what happened with Java's performance over the last 25 years. Rakudo/NQP/MoarVM are about halfway through the maturing process that the Java stack has gone through.

Footnotes

¹ I could have written my $table := .... But declarations of the form my \foo ... eliminate consideration of sigils and allow use of = rather than := which would be required with a sigil'd identifier. (As a bonus, "slashing out the sigil" results in a sigil free identifier, familiar to coders in the many languages that don't use sigils which of course includes Python and C#.)

² Shaping may one day result in faster array operations for some code. In the meantime, as mentioned in comments on your question, it clearly currently does the opposite, significantly slowing it down. I imagine that's in large part because every array access is being naively dynamically bounds-checked at the moment, slowly everything down, and there's also been no effort to use the fixed size to help speed things up. In addition, when I tried to come up with a fast workaround for your code, I failed to find one using the fixed size array due to many operations on fixed size arrays being currently unimplemented. Again, these will hopefully be implemented one day but have presumably not been a sufficient pain point for anyone to work on implementing them to date.

³ At the time of writing this, TIO is using Python 3.7.4, from June 2019, and Rakudo v2018.12, from December 2018. Rakudo's performance is currently improving over time significantly faster than the official Python 3 interpreter is, so I would expect the gap between the latest Rakudo and the latest Python, when Rakudo is slower, to be significantly narrower than stated in this answer. In particular, current work is significantly improving the performance of assignments.

⁴xx defaults to lazy processing but some expressions force eager evaluation due either to language semantics or current compiler limitations. In the the year old v2018.12 Rakudo, for an expression of the form [ [ foo xx bar ] xx baz ] to remain lazy and not be forced to evaluate eagerly, both bar and baz must be Inf. In contrast, my \table = [0 xx 100_000 for ^100_000] is lazy with no use of Inf. (The latter code is actually storing 100,000 Seqs in the first dimension rather than 100,000 Arrays -- say WHAT table[0] displays Seq rather than Array -- but most code won't be able to spot the difference -- say table[99_999;99_999] will still display 0.)

⁵ Some folk think that it's a weakness to accept that there's more than one way to think about and code solutions to given problems. In reality it's a strength in at least three regards. First, in general, any given non-trivial problem can be solved by many distinct algorithms with dramatic differences in performance profile. This answer includes an approach already available with a year old Rakudo that will be more than a thousand times faster than Python in practice in some scenarios. Second, a deliberately flexible and multi-paradigm language like Raku allows a coder (or team of coders) to express a solution that they consider elegant and maintainable, or that just gets the job done, based on what they think is best, not what the language imposes. Third, Rakudo's performance as an optimizing compiler is currently notably variable. Fortunately it has a great profiler⁶, so one can see where a bottleneck is, and great flexibility, so one can try alternate coding and this may produce vastly faster code.

⁶ When performance matters, or if you're investigating performance issues, consult the Raku doc page on performance; the page covers a range of options including use of the Rakudo profiler.

answered Sep 27 '22 21:09

raiph

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does Raku perform so bad with multidimensional arrays?

Tags:

performance

raku

Javi

People also ask

1 Answers

An initial direct comparison

An idiomatic alternative that's a thousand times faster

There's More Than One Way To Do It

The future

Footnotes

raiph

Recent Activity

Donate For Us

Why does Raku perform so bad with multidimensional arrays?

Tags:

performance

raku

Javi

People also ask

1 Answers

An initial direct comparison

An idiomatic alternative that's a thousand times faster

There's More Than One Way To Do It

The future

Footnotes

raiph

Related questions

Recent Activity

Donate For Us