Implementing a direct-threaded interpreter in a functional language like OCaml

Question

In C/C++ you can implement a direct threaded interpreter with an array of function pointers. The array represents your program - an array of operations. Each of the operation functions must end in a call to the next function in the array, something like:

void op_plus(size_t pc, uint8_t* data) {
  *data += 1;
  BytecodeArray[pc+1](pc+1, data); //call the next operation in the array
}

The BytecodeArray is an array of function pointers. If we had an array of these op_plus operations then length of the array would determine how ofter we'd be incrementing the contents of data. (of course, you'd need to add some sort of terminating operation as the last operation in the array).

How would one go about implementing something like this in OCaml? I may be trying to translate this code too literally: I was using an OCaml Array of functions as in the C++. The problem with that is that I keep ending up with something like:

let op_plus pc data = Printf.printf "pc: %d, data_i: %d 
" pc data;
                        let f = (op_array.(pc+1)) in         
                        f (pc+1) (data+1) ;;

Where op_array is an Array defined in the scope above and then redefine it later to be filled with a bunch of op_plus functions... however, the op_plus function uses the previous definition of op_array. It's a chicken&egg problem.

rkhayrov · Accepted Answer

Another alternative would be using CPS and avoid explicit function array altogether. Tail call optimization still applies in this case.

I don't know how do you generate the code, but let's make not unreasonable assumption that at some point you have an array of VM instructions you want to prepare for execution. Every instruction is still represented as a function, but instead of program counter it receives continuation function.

Here is the simplest example:

type opcode = Add of int | Sub of int

let make_instr opcode cont =
    match opcode with
    | Add x -> fun data -> Printf.printf "add %d %d
" data x; cont (data + x)
    | Sub x -> fun data -> Printf.printf "sub %d %d
" data x; cont (data - x)

let compile opcodes =
    Array.fold_right make_instr opcodes (fun x -> x)

Usage (look at inferred types):

# #use "cpsvm.ml";;
type opcode = Add of int | Sub of int
val make_instr : opcode -> (int -> 'a) -> int -> 'a = <fun>
val compile : opcode array -> int -> int = <fun>
# let code = [| Add 13; Add 42; Sub 7 |];;
val code : opcode array = [|Add 13; Add 42; Sub 7|]
# let fn = compile code;;
val fn : int -> int = <fun>
# fn 0;;
add 0 13
add 13 42
sub 55 7
- : int = 48

UPDATE:

It's easy to introduce [conditional] branching in this model. if continuation is constructed from two arguments: iftrue-continuation and iffalse-continuation, but has the same type as every other continuation function. The problem is that we don't know what constitutes these continuations in case of backward branching (backward, because we compile from tail to head). That's easy to overcome with destructive updates (though maybe more elegant solution is possible if you are compiling from a high level language): just leave "holes" and fill them later when branch target is reached by the compiler.

Sample implementation (I've made use of string labels instead of integer instruction pointers, but this hardly matters):

type label = string

type opcode =
      Add of int | Sub of int
    | Label of label | Jmp of label | Phi of (int -> bool) * label * label

let make_instr labels opcode cont =
    match opcode with
    | Add x -> fun data -> Printf.printf "add %d %d
" data x; cont (data + x)
    | Sub x -> fun data -> Printf.printf "sub %d %d
" data x; cont (data - x)
    | Label label -> (Hashtbl.find labels label) := cont; cont
    | Jmp label ->
        let target = Hashtbl.find labels label in
        (fun data -> Printf.printf "jmp %s
" label; !target data)
    | Phi (cond, tlabel, flabel) ->
        let tcont = Hashtbl.find labels tlabel
        and fcont = Hashtbl.find labels flabel in
        (fun data ->
            let b = cond data in
            Printf.printf "branch on %d to %s
"
                data (if b then tlabel else flabel);
            (if b then !tcont else !fcont) data)

let compile opcodes =
    let id = fun x -> x in
    let labels = Hashtbl.create 17 in
    Array.iter (function
        | Label label -> Hashtbl.add labels label (ref id)
        | _ -> ())
        opcodes;
    Array.fold_right (make_instr labels) opcodes id

I've used two passes for clarity but it's easy to see that it can be done in one pass.

Here is a simple loop that can be compiled and executed by the code above:

let code = [|
    Label "entry";
    Phi (((<) 0), "body", "exit");
    Label "body";
    Sub 1;
    Jmp "entry";
    Label "exit" |]

Execution trace:

# let fn = compile code;;
val fn : int -> int = <fun>
# fn 3;;
branch on 3 to body
sub 3 1
jmp entry
branch on 2 to body
sub 2 1
jmp entry
branch on 1 to body
sub 1 1
jmp entry
branch on 0 to exit
- : int = 0

UPDATE 2:

Performance-wise, CPS representation is likely to be faster than array-based, because there is no indirection in case of linear execution. Continuation function is stored directly in the instruction closure. In the array-based implementation it has to increment program counter and perform array access (with an extra bounds checking overhead) first.

I've made some benchmarks to demonstrate it. Here is an implementation of array-based interpreter:

type opcode =
      Add of int | Sub of int
    | Jmp of int | Phi of (int -> bool) * int * int
    | Ret

let compile opcodes =
    let instr_array = Array.make (Array.length opcodes) (fun _ data -> data)
    in Array.iteri (fun i opcode ->
        instr_array.(i) <- match opcode with
        | Add x -> (fun pc data ->
            let cont = instr_array.(pc + 1) in cont (pc + 1) (data + x))
        | Sub x -> (fun pc data ->
            let cont = instr_array.(pc + 1) in cont (pc + 1) (data - x))
        | Jmp pc -> (fun _ data ->
            let cont = instr_array.(pc) in cont (pc + 1) data)
        | Phi (cond, tbranch, fbranch) ->
            (fun _ data ->
                let pc = (if cond data then tbranch else fbranch) in
                let cont = instr_array.(pc) in
                cont pc data)
        | Ret -> fun _ data -> data)
        opcodes;
    instr_array

let code = [|
    Phi (((<) 0), 1, 3);
    Sub 1;
    Jmp 0;
    Ret
    |]

let () =
    let fn = compile code in
    let result = fn.(0) 0 500_000_000 in
    Printf.printf "%d
" result

Let's see how it compares to the CPS-based interpreter above (with all debug tracing stripped, of course). I used OCaml 3.12.0 native compiler on Linux/amd64. Each program was run 5 times.

array: mean = 13.7 s, stddev = 0.24
CPS: mean = 11.4 s, stddev = 0.20

So even in tight loop CPS performs considerably better than array. If we unroll loop and replace one sub instruction with five, figures change:

array: mean = 5.28 s, stddev = 0.065
CPS: mean = 4.14 s, stddev = 0.309

It's interesting that both implementations actually beat OCaml bytecode interpreter. The following loop takes 17 seconds to execute on my machine:

for i = 500_000_000 downto 0 do () done

Implementing a direct-threaded interpreter in a functional language like OCaml

Tags:

functional-programming

interpreter

ocaml

aneccodeal

1 Answers

rkhayrov

Recent Activity

Donate For Us

Implementing a direct-threaded interpreter in a functional language like OCaml

Tags:

functional-programming

interpreter

ocaml

aneccodeal

1 Answers

rkhayrov

Related questions

Recent Activity

Donate For Us