Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What kind of interpreter is the Ruby MRI?

Is it a language interpreter? Or a bytecode interpreter / JIT compiler? Where can I learn more about the implementation (other than by browsing the source code)?

like image 633
user6245072 Avatar asked Dec 10 '22 12:12

user6245072


2 Answers

Note: the term "MRI" is confusing. It means "Matz's Ruby/Reference Implementation/Interpreter". However, MRI has been retired and isn't developed or maintained anymore.

MRI was a pure AST-walking interpreter, with no compilation involved anywhere.

The confusing thing is: Matz has written a new implementation, but that's called MRuby, not MRI. And the implementation that is now called MRI wasn't written by Matz. So, really, it is best to simply not use that term at all, and be specific about which implementation you are talking about.

The name of the implementation that people now call MRI is actually YARV (for Yet Another Ruby VM), and it was written by Koichi Sasada. It consists of an Ahead-Of-Time compiler which compiles Ruby source code to YARV byte code and an interpreter which interprets said byte code. Thus, it is a completely typical byte code VM, exactly like CPython for Python, Zend Engine for PHP, the Lua VM, older versions of Rubinius, older versions of SpiderMonkey for ECMAScript, and so on.

There is talk about adding a JIT compiler from YARV bytecode to native machine code to the VM for YARV 3, which would then make the VM a mixed-mode execution engine.

Matz's current implementation, MRuby, is also a bytecoded VM.

For completeness' sake, here are a couple of other Ruby implementations, first the currently production-ready ones, and then a couple of historically interesting ones:

  • Rubinius: compiles Ruby source code to Rubinius byte code ahead-of-time, then hands that bytecode off to a mixed-mode execution engine consisting of a bytecode interpreter and an LLVM-based JIT compiler; they have recently introduced or are currently in the process of introducing a separate Intermediate Representation (IR) for the JIT compiler, so the interpreter works off Rubinius bytecode, but the JIT compiler works off Compiler IR. Rubinius also belongs into the "historically interesting" category, because it was the first successful Ruby implementation a significant part of which was implemented in Ruby; there had been other projects before, but Rubinius was the first to be production-ready.
  • JRuby: the main mode is a mixed-mode execution engine consisting of an AST-walking interpreter, and a JIT compiler that first translates the AST into IR, which it then further compiles to JVM bytecode. The other mode is an AOT compiler which compiles Ruby sourcecode to JVM bytecode ahead-of-time.
  • Opal: an Ahead-Of-Time compiler that compiles Ruby sourcecode to ECMAScript sourcecode.
  • MagLev: an implementation based on the GemStone/S Smalltalk VM. Unfortunately, I don't know much about it, I believe it compiles Ruby sourcecode to GemStone/S bytecode, the GemStone/S VM then is a standard mixed-mode VM with a bytecode interpreter and a JIT compiler.

Some no longer maintained but historically interesting implementations:

  • Topaz: an implementation using the RPython/PyPy VM framework; the PyPy framework is interesting because it includes a tracing JIT compiler that unlike other JIT compilers doesn't work besides the interpreter and compiles the user program, instead it compiles the interpreter while it is interpreting the user programs. What this basically means is that the JIT has to be written only once by the PyPy developers, and every language implementor using the PyPy framework only has to write a simple bytecode interpreter gets an optimizing native JIT compiler for free.
  • XRuby: the first static AOT compiler for Ruby, implemented for the JVM.
  • IronRuby: it started out as a pure JIT compiler without an interpreter, but an interpreter was later added, because it turned out that this actually improved performance (which is contrary to the popular myth that interpreters are slow).
  • unholy: a proof-of-concept AOT compiler that compiles YARV bytecode to CPython bytecode; this was hacked up by _why the lucky stiff when the Google App Engine first came out and only supported Python, the idea was that you could compile your Ruby sourcecode to YARV bytecode using YARV, compile the YARV bytecode to CPython bytecode using unholy, compile the CPython bytecode to Python sourcecode using decompyle, and then upload the Python sourcecode to GAE to run your shiny new Ruby app.
  • Honorable mentions go to: tinyrb, metaruby, Ruby.NET, Red Sun, HotRuby, BlueRuby, SmallRuby

A couple of interesting current research projects are:

  • JRuby+Truffle: this project is re-implementing JRuby's internals using the Truffle AST interpreter framework from Oracle Labs; this version, when run on a Graal-enable JVM (another Oracle Labs research project) is able to attain performance similar to Java and sometimes even reaching (and overtaking) C.
  • Ruby+OMR: IBM has broken up its J9 JVM into independently re-usable, language-independent building blocks for VM implementors and released it under an open source license under the Eclipse umbrella as the Eclipse Open Managed Runtime. It's not an academic project: the Java 8 version of IBM J9 is actually implemented using OMR. The Ruby+OMR project is a proof-of-concept by the OMR developers, replacing YARV's garbage collector with OMR's, and adding OMR's JIT compiler, profiler, and debugger to YARV. It is fairly impressive just how language-independent all the stuff really is, the entire patch is less than 10000 lines, and that is not just the glue code, it actually includes all the required OMR components as well. (There's also an equivalent Python+OMR project, but that's still non-public.)

Last but not least, you may sometimes hear about "Rite". Rite was used as a codename for a complete re-write of MRI for over a decade. Matz said that when he wrote MRI he didn't actually know anything about language implementation, so he wanted to do it "right" (get it?) a second time. At the same time, there was also a lot of talk about Ruby 2.0, wanting to fix some long-standing design deficiencies in the language. The two were lumped together, so Rite was talked about as the new implementation of Ruby 2.0. However, YARV came along and was so good that Matz decided he didn't need to write his own VM after all, and he basically decided that "YARV is Rite".

But now, he did write his own VM nonetheless, which is why you will sometimes hear MRuby (or its VM component) referred to as "Rite".

like image 110
Jörg W Mittag Avatar answered Jan 20 '23 10:01

Jörg W Mittag


It's a bytecode interpreter called YARV, written by Sasada Koichi.

Here's one example of how it looks:

puts RubyVM::InstructionSequence.compile("1+1").disasm
== disasm: #<ISeq:<compiled>@<compiled>>================================
0000 trace            1                                               (   1)
0002 putobject_OP_INT2FIX_O_1_C_
0003 putobject_OP_INT2FIX_O_1_C_
0004 opt_plus         <callinfo!mid:+, argc:1, ARGS_SIMPLE>, <callcache>
0007 leave

Further reading:

  • YARV instruction set

While MRI doesn't have a JIT yet, there's the Ruby+OMR project, that's trying to add a JIT compiler based on Eclipse OMR:

  • Ruby+OMR JIT Compiler: What’s next?
like image 40
Michael Kohl Avatar answered Jan 20 '23 12:01

Michael Kohl