Tutorial/resource for implementing VM

Tags:

I want self-education purpose implement a simple virtual machine for a dynamic language, prefer in C. Something like the Lua VM, or Parrot, or Python VM, but simpler. Are there any good resources/tutorials on achieving this, apart from looking at code and design documentations of the existing VMs?

Edit: why close vote? I don't understand - is this not programming. Please comment if there is specific problem with my question.

539

asked Jan 09 '10 18:01

zaharpopov

2 Answers

I assume you want a virtual machine rather than a mere interpreter. I think they are two points on a continuum. An interpreter works on something close to the original representation of the program. A VM works on more primitive (and self-contained) instructions. This means you need a compilation stage to translate the one to the other. I don't know if you want to work on that first or if you even have an input syntax in mind yet.

For a dynamic language, you want somewhere that stores data (as key/value pairs) and some operations that act on it. The VM maintains the store. The program running on it is a sequence of instructions (including control flow). You need to define the set of instructions. I'd suggest a simple set to start with, like:

basic arithmetic operations, including arithmetic comparisons, accessing the store
basic control flow
built-in print

You may want to use a stack-based computation approach to arithmetic, as many VMs do. There isn't yet much dynamic in the above. To get to that we want two things: the ability to compute the names of variables at runtime (this just means string operations), and some treatment of code as data. This might be as simple as allowing function references.

Input to the VM would ideally be in bytecode. If you haven't got a compiler yet this could be generated from a basic assembler (which could be part of the VM).

The VM itself consists of the loop:

1. Look at the bytecode instruction pointed to by the instruction pointer. 2. Execute the instruction:    * If it's an arithmetic instruction, update the store accordingly.    * If it's control flow, perform the test (if there is one) and set the instruction pointer.    * If it's print, print a value from the store. 3. Advance the instruction pointer to the next instruction. 4. Repeat from 1.

Dealing with computed variable names might be tricky: an instruction needs to specify which variables the computed names are in. This could be done by allowing instructions to refer to a pool of string constants provided in the input.

An example program (in assembly and bytecode):

offset  bytecode (hex)   source  0      01 05 0E         //      LOAD 5, .x  3      01 03 10         // .l1: LOAD 3, .y  6      02 0E 10 0E      //      ADD .x, .y, .x 10      03 0E            //      PRINT .x 12      04 03            //      GOTO .l1 14      78 00            //      .x: "x" 16      79 00            //      .y: "y"

The instruction codes implied are:

"LOAD x, k" (01 x k) Load single byte x as an integer into variable named by string constant at offset k. "ADD k1, k2, k3" (02 v1 v2 v3) Add two variables named by string constants k1 and k2 and put the sum in variable named by string constant k3. "PRINT k" (03 k) Print variable named by string constant k. "GOTO a" (04 a) Go to offset given by byte a.

You need variants for when variables are named by other variables, etc. (and the levels of indirection get tricky to reason about). The assembler looks at the arguments like "ADD .x, .y, .x" and generates the correct bytecode for adding from string constants (and not computed variables).

150

answered Sep 29 '22 08:09

Edmund

Well, it's not about implementing a VM in C, but since it was the last tab I had open before I saw this question, I feel like I need point out an article about implementing a QBASIC bytecode compiler and virtual machine in JavaScript using the <canvas> tag for display. It includes all of the source code to get enough of QBASIC implemented to run the "nibbles" game, and is the first in a series of articles on the compiler and bytecode interpreter; this one describes the VM, and he's promising future articles describing the compiler as well.

By the way, I didn't vote to close your question, but the close vote you got was as a duplicate of a question from last year on how to learn about implementing a virtual machine. I think this question (about a tutorial or something relatively simple) is different enough from that one that it should remain open, but you might want to refer to that one for some more advice.

answered Sep 29 '22 09:09

Brian Campbell

Related questions
                            
                                How to print binary number via printf [duplicate]
                            
                                How can I convert an integer to a hexadecimal string in C?
                            
                                Top down and Bottom up programming
                            
                                What is the difference between the functions of the exec family of system calls like exec and execve?
                            
                                Call a function before main [duplicate]
                            
                                Where to document functions in C or C++? [closed]
                            
                                Where does the k prefix for constants come from?
                            
                                Interesting project to learn C? [closed]
                            
                                Declaring type of pointers?
                            
                                Why use the Bitwise-Shift operator for values in a C enum definition?
                            
                                What does a const pointer-to-pointer mean in C and in C++?
                            
                                In binary notation, what is the meaning of the digits after the radix point "."?
                            
                                How do I change a TCP socket to be non-blocking?
                            
                                floating point multiplication vs repeated addition
                            
                                What is the fastest way to return the positions of all set bits in a 64-bit integer?
                            
                                Linker performance related to swap space?
                            
                                Why is a switch not optimized the same way as chained if else in c/c++?
                            
                                Why does the general program usually start at 0x8000?
                            
                                Change floating point rounding mode
                            
                                How to read from stdin with fgets()?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With