The performance penalties for types/constraints in Raku?

Question

In contrast with Perl 5, Raku introduced gradual typing. The landscape of gradually typed object-oriented languages is rich and includes: Typed Racket, C#, StrongScript, Reticulated Python.

It's said that "optional gradual type-checking at no additional runtime cost" on the Raku official website. As far as I know, some gradual typing language (like Typed Racket and Reticulated Python) suffered from the serious performance issue due to strategy of enforcing type system soundness. On the other hand, the concrete types in StrongScript performs well thanks to the relatively inexpensive nominal subtype tests. Research on classification of gradual typing (excluding Raku):

C# and concrete types in StrongScript: use run-time subtype tests on type constructors to supplement static typing. While statically typed code executes at native speed, values are dynamically checked at typed-untyped boundaries. Types insert efficient casts and lead to code that can be optimized. They are also sound and has low overheads, but comes at a cost in expressiveness and ability to migrate from untyped to typed.

Typed Racket: monitors values to ensure that they behave in accordance to their assigned types. Instead of checking higher-order and mutable values for static type tags like concrete, wrappers ensure enduring conformance of values to their declared type. It avoids casts in typed code. The price it pays for this soundness, however, is that heavyweight wrappers inserted at typed-untyped boundaries.

Reticulated Python: lies between above two; it adds type casts but does so only for the top level of data structures. The performance of the transient semantics for Reticulated Python is a worst case scenario for concrete types –i.e, there is a cast at almost every call. It checks types at uses, so the act of adding types to a program introduces more casts and may slow the program down (even in fully typed code).

Is Raku's run-time enforcement strategy similar to C# and concrete types in StrongScript, or does it have its own set of strategies to ensure that there is no obvious performance issue like Typed Racket and Reticulated Python? Does it have a sound gradual type system?

Jonathan Worthington · Accepted Answer

Raku mandates that type constraints written into the program are enforced at runtime at latest. How that promise is kept is up to the compiler and runtime implementer. I'll discuss how the Rakudo (compiler) and MoarVM (runtime) pairing does it, because that's what I've worked on.

The initial compilation itself does rather little in terms of analysis to eliminate type checks, and thus the bytecode we produce has a lot of type checks in it. The bet being made here is that analysis takes time, only some of the code will actually find itself on a hot path (or for very short scripts, there is no hot path), so we might as well leave it to the VM to figure out what's hot, and then focus on those bits.

The VM does the typical profiling a modern runtime does, not only recording what code is hot, but also recording statistics on parameter types, return types, lexical types, and so forth. Despite the amount of potential dynamism that could occur, in a given application the reality is that a huge amount of code is monomorphic (only ever sees one type, or for a routine, one argument type tuple). Another bunch is polymorphic (sees a few different types), and a comparatively tiny amount is megamorphic (loads of types).

Based on the data it gets, the runtime produces specializations: versions of the code compiled based on assumptions about what exact types will show up. Guarding against exact types is cheaper than having to care for subtyping relations and so forth. So at this point we've got a version of the code where we have some cheap preconditions up front, and we've used them to eliminate the more costly type checks (as well as some extra guards scattered through the code replacing other type checks). However, this isn't really free...yet.

When calls are made, one of two things can happen:

For small callees, inlining takes place. We inline a specialization of the callee. If the knowledge of types in the caller is already sufficient to prove the type assumptions - which it often is - then there's no need for any guarding. Essentially, type checks in the callee just became free. We can inline multiple levels deep. Furthermore, inlining lets us trace data flows through the callee, which may let us eliminate further guards, for example about return value types in the callee.
For larger callees, we can perform specialization linking - that is, calling a specialization directly and bypassing its guards, because we can use the type knowledge in the caller to prove we're meeting the guard assumptions. Again, the callee parameter type checks thus become free.

But what about type-y things that aren't calls, such as return value type checks and assignments? We compile those as calls too, so we can re-use the same machinery. For example, a return type check, in the case it's monomorphic (often), turns into a guard + a call to the identity function, and whenever we can prove the guard, that just turns into the identity function, which is a trivial inline.

There's more to come yet. Of note:

The mechanisms I've described above are built around various kinds of caches and guard trees and it's not all quite so beautiful as I've made it sound. Sometimes one needs to build ugly to learn enough to know how to build nice. Thankfully, a current bunch of work is folding all of these learnings into a new, unified, guard and dispatch mechanism, which will also take on various aspects of the language that are very poorly optimized today. That's due to land in releases within a couple of months.
The current runtime already does some very limited escape analysis and scalar replacement. This means that it can trace data flows into short-lived objects, and thus find even more type checks to eliminate (on top of having eliminated the memory allocations). There's work underway to make it more powerful, providing partial escape analysis, transitive analysis in order to scalar replace entire object graphs and thus be able to trace data flows, and so types, through them.

Last year, a paper titled Transient typechecks are (almost) free was published. It's not about Raku/Rakudo/MoarVM at all, but it's the closest description I've seen in academic literature to what we're doing. That was the first time I realized that maybe we are doing something kinda innovative in this area. :-)

raiph · Answer

Now jnthn has written an authoritative overview of where things are at for Rakudo and MoarVM as of 2020, I feel OK publishing what amounts to a non-expert writing up some hand wavy historical notes covering 2000 thru 2019 that may be of interest to some readers.

My notes are organized to respond to excerpts from your question:

The performance penalties for types/constraints in Raku?

There aren't supposed to be penalties, but rather the reverse. That is to say Larry Wall wrote, in an early (2001) design doc:

more performance and safety as you give it more type information to work with

(This was 4 years before the term "gradual typing" was introduced at a 2005 academic conference.)

So his intent was that if a dev added a suitable type, the program ran either safer, or faster/leaner, or both.

(And/or was able to be used in interop with foreign languages: "Besides performance and safety, one other place where type information is useful is in writing interfaces to other languages.". A decade later he was saying that the #1 and #2 reasons for types were multiple dispatch and documentation.)

I don't know of any systematic effort to measure the degree to which Rakudo delivers on the design intent that types never slow code down and predictably speed it up if they're native types.

In addition, Rakudo is still relatively rapidly changing, with an overall annual performance improvement in the 2-3x range stretching back a decade.

(While Rakudo is 15 years old, it's been developed as the Raku language has evolved alongside it -- finally settling down in the last few years -- and the overall phasing of Rakudo's development has been a deliberate 1-2-3 of "Make it work, Make it work right, Make it work fast", with the latter only really beginning to kick in in recent years.)

As far as I know, some gradual typing language (like Typed Racket and Reticulated Python) suffered from the serious performance issue due to strategy of enforcing type system soundness.

Gradual Typing from Theory to Practice (2019) summarized a 2015 paper that said:

The first systematic effort to measure [soundness costs] ... revealed substantial performance problems ...

... (presumably the ones you've been reading about) ....

[and that] performance can be significantly improved using JIT compilers, nominal types, representation improvements, and custom-built compilers, among others ...

Now compare their above recipe for performance with the characteristics of Rakudo and Raku:

Rakudo is a 15 year old custom-built compiler with several backends including the custom MoarVM backend with an x86 JIT.
The Raku language has a (gradual) nominal type system.
The Raku language supports representation polymorphism. This is like the mother of all representation improvements, not in the sense of being one, but rather in the sense it abstracts representation from structure so it's possible to improve with the freedom that representation polymorphism brings.
There are other potential type system related contributions to performance; eg I expect native arrays (including multi-dimensional; sparse; etc.) to one day be a significant contributor.

On the other hand, the concrete types in StrongScript performs well thanks to the relatively inexpensive nominal subtype tests

I note jnthn's comment:

Guarding against exact types is cheaper than having to care for subtyping relations and so forth

My guess is that the jury will be out for about another 5 years or so on whether Rakudo is delivering, or will one day deliver, sufficient performance to make its gradual typing generally attractive.

And perhaps one juror (hi Nile) will be the first to draw some tentative conclusions about how Raku(do) compares to other gradually typed languages in the next year or so?

Soundness

Does it have a sound gradual type system?

In the sense there's a mathematical treatment? I'm 99% sure the answer is no.

In the sense that it's thought to be sound? Where the only presumed guarantee is memory safety? I think so. Anything more than that? Good question.

All I can say is that afaik Raku's type system was developed by hackers like Larry Wall and Audrey Tang. (cf her 2005 notes on type inference.)

The performance penalties for types/constraints in Raku?

Tags:

performance

c#

raku

Nile

Video Answer

2 Answers

Jonathan Worthington

Soundness

raiph

Recent Activity

Donate For Us

The performance penalties for types/constraints in Raku?

Tags:

performance

c#

raku

Nile

Video Answer

2 Answers

Jonathan Worthington

Soundness

raiph

Related questions

Recent Activity

Donate For Us