Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tutorial/resource for implementing VM

I want self-education purpose implement a simple virtual machine for a dynamic language, prefer in C. Something like the Lua VM, or Parrot, or Python VM, but simpler. Are there any good resources/tutorials on achieving this, apart from looking at code and design documentations of the existing VMs?

Edit: why close vote? I don't understand - is this not programming. Please comment if there is specific problem with my question.

like image 539
zaharpopov Avatar asked Jan 09 '10 18:01

zaharpopov


People also ask

What resources are required for a virtual machine?

Each virtual machine contains a set of its own virtual hardware and there are four primary resources that a virtual machine needs in order to correctly function. These are CPU, memory, network, and hard disk. These four resources look like physical hardware to the guest operating systems and applications.

What is virtual machine tutorial?

Virtual Machine can be defined as an emulation of the computer systems in computing. Virtual Machine is based on computer architectures. It also gives the functionality of physical computers. The implementation of VM may consider specialized software, hardware, or a combination of both.


2 Answers

I assume you want a virtual machine rather than a mere interpreter. I think they are two points on a continuum. An interpreter works on something close to the original representation of the program. A VM works on more primitive (and self-contained) instructions. This means you need a compilation stage to translate the one to the other. I don't know if you want to work on that first or if you even have an input syntax in mind yet.

For a dynamic language, you want somewhere that stores data (as key/value pairs) and some operations that act on it. The VM maintains the store. The program running on it is a sequence of instructions (including control flow). You need to define the set of instructions. I'd suggest a simple set to start with, like:

  • basic arithmetic operations, including arithmetic comparisons, accessing the store
  • basic control flow
  • built-in print

You may want to use a stack-based computation approach to arithmetic, as many VMs do. There isn't yet much dynamic in the above. To get to that we want two things: the ability to compute the names of variables at runtime (this just means string operations), and some treatment of code as data. This might be as simple as allowing function references.

Input to the VM would ideally be in bytecode. If you haven't got a compiler yet this could be generated from a basic assembler (which could be part of the VM).

The VM itself consists of the loop:

1. Look at the bytecode instruction pointed to by the instruction pointer. 2. Execute the instruction:    * If it's an arithmetic instruction, update the store accordingly.    * If it's control flow, perform the test (if there is one) and set the instruction pointer.    * If it's print, print a value from the store. 3. Advance the instruction pointer to the next instruction. 4. Repeat from 1. 

Dealing with computed variable names might be tricky: an instruction needs to specify which variables the computed names are in. This could be done by allowing instructions to refer to a pool of string constants provided in the input.

An example program (in assembly and bytecode):

offset  bytecode (hex)   source  0      01 05 0E         //      LOAD 5, .x  3      01 03 10         // .l1: LOAD 3, .y  6      02 0E 10 0E      //      ADD .x, .y, .x 10      03 0E            //      PRINT .x 12      04 03            //      GOTO .l1 14      78 00            //      .x: "x" 16      79 00            //      .y: "y" 

The instruction codes implied are:

"LOAD x, k" (01 x k) Load single byte x as an integer into variable named by string constant at offset k. "ADD k1, k2, k3" (02 v1 v2 v3) Add two variables named by string constants k1 and k2 and put the sum in variable named by string constant k3. "PRINT k" (03 k) Print variable named by string constant k. "GOTO a" (04 a) Go to offset given by byte a. 

You need variants for when variables are named by other variables, etc. (and the levels of indirection get tricky to reason about). The assembler looks at the arguments like "ADD .x, .y, .x" and generates the correct bytecode for adding from string constants (and not computed variables).

like image 150
Edmund Avatar answered Sep 29 '22 08:09

Edmund


Well, it's not about implementing a VM in C, but since it was the last tab I had open before I saw this question, I feel like I need point out an article about implementing a QBASIC bytecode compiler and virtual machine in JavaScript using the <canvas> tag for display. It includes all of the source code to get enough of QBASIC implemented to run the "nibbles" game, and is the first in a series of articles on the compiler and bytecode interpreter; this one describes the VM, and he's promising future articles describing the compiler as well.

By the way, I didn't vote to close your question, but the close vote you got was as a duplicate of a question from last year on how to learn about implementing a virtual machine. I think this question (about a tutorial or something relatively simple) is different enough from that one that it should remain open, but you might want to refer to that one for some more advice.

like image 41
Brian Campbell Avatar answered Sep 29 '22 09:09

Brian Campbell