Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Designing a virtual machine with JIT

I'm developing a scripting language that compiles for its own virtual machine, a simple one that has instructions to work with some kind of data like points, vectors, floats and so on.. the memory cell is represented in this way:

struct memory_cell
{
    u32 id;
    u8 type;

    union
    {
        u8 b; /* boolean */
        double f; /* float */
        struct { double x, y, z; } v; /* vector */
        struct { double r, g, b; } c; /* color */
        struct { double r, g, b; } cw; /* color weight */
        struct { double x, y, z; } p; /* point variable */
        struct { u16 length; memory_cell **cells; } l; /* list variable */
    };  
};

Instructions are generic and able to work on many different operands. For example

ADD dest, src1, src2

can work with floats, vectors, points, colors setting the right type of destination according to operands.

The main execution cycle just check the opcode of the instruction (which is a struct containing unions to define any kind of instruction) and executes it. I used a simplified approach in which I don't have registers but just a big array of memory cells.

I was wondering if JIT could help me in getting best performances or not and how to achieve it.

As I said the best implementation reached so far is something like that:

 void VirtualMachine::executeInstruction(instr i)
 {
     u8 opcode = (i.opcode[0] & (u8)0xFC) >> 2;

     if (opcode >= 1 && opcode <= 17) /* RTL instruction */
     {
        memory_cell *dest;
        memory_cell *src1;
        memory_cell *src2;

        /* fetching destination */
        switch (i.opcode[0] & 0x03)
        {
            /* skip fetching for optimization */
            case 0: { break; }
            case MEM_CELL: { dest = memory[stack_pointer+i.rtl.dest.cell]; break; }
            case ARRAY_VAL: { dest = memory[stack_pointer+i.rtl.dest.cell]->l.cells[i.rtl.dest.index]; break; }
            case ARRAY_CELL: { dest = memory[stack_pointer+i.rtl.dest.cell]->l.cells[(int)i.rtl.dest.value]; break; }
        }

     /* omitted code */

     switch (opcode)
     {
         case ADD:
         {
             if (src1->type == M_VECTOR && src2->type == M_VECTOR)
             {
                 dest->type = M_VECTOR;
                 dest->v.x = src1->v.x + src2->v.x;
                 dest->v.y = src1->v.y + src2->v.y;
                 dest->v.z = src1->v.z + src2->v.z;
              }

      /* omitted code */

Is it easy/convenient to try jit compilation? But I really don't know where to start from, that's why I'm asking some advices.

Apart from that, are there any other advices I should consider in developing it?

This virtual machine should be enough fast to do calculate shaders for a ray tracer but I sill haven't done any kind of benchmark.

like image 277
Jack Avatar asked Dec 06 '22 04:12

Jack


2 Answers

Before writing a JIT ("Just-in-time") compiler, you should at least consider how you would write a "Way-ahead-of-time" compiler.

That is, given a program consisting of instructions for your VM, how would you produce a program consisting of x86 (or whatever) instructions, that does the same as the original program? How would you optimise the output for different instruction sets, and different versions of the same architecture? The example opcode you've given has quite a complicated implementation, so which opcodes would you implement "inline" by just emitting code that does the job, and which would you implement by emitting a call to some shared code?

A JIT has to be able to do this, and it also has to make decisions while the VM is running about which code it does it to, when it does it, and how it represents the resulting mixture of VM instructions and native instructions.

If you're not already an assembly-jockey, then I don't recommend writing a JIT. That's not to say "don't do it ever", but you should become an assembly-jockey before you start in earnest.

An alternative would be to write a non-JIT compiler to convert your VM instructions (or the original scripting language) to Java bytecode, or LLVM, as Jeff Foster says. Then let the toolchain for that bytecode do the difficult, CPU-dependent work.

like image 101
Steve Jessop Avatar answered Dec 27 '22 23:12

Steve Jessop


A VM is a big task to consider. Have you considered basing your VM on something like LLVM?

LLVM will provide a good base to start from and there are plenty of example projects which you can use for understanding.

like image 34
Jeff Foster Avatar answered Dec 27 '22 21:12

Jeff Foster