I am trying to understand the practical difference during the execution of a program in Decode and dispatch interpretation and Threaded interpretation.
Example of both will really help.
I understand how Java bytecode works and how an assembly language works. But where does DDI and TI fit in?
Context: Virtual machines: versatile platforms for systems and processes
(Note: I'll assume that by "decode and dispatch" you mean a switch-based interpreter.)
The difference between a switch-based and a threaded interpreter at run time is, basically, the number of jumps that are performed.
In a switch-based interpreter, instructions are decoded at some central location, and based on the result of the decoding, a jump is performed to the piece of code that handles the decoded instruction. Once that piece of code has finished interpreting the instruction, it jumps back to the centralized decoding code, which proceeds with the next instruction. This means that (at least) two jumps are performed per interpreted instruction. The following piece of C code illustrates what such an interpreter might look like:
typedef enum {
add, /* ... */
} instruction_t;
void interpret() {
static instruction_t program[] = { add /* ... */ };
instruction_t* pc = program;
int* sp = ...; /* stack pointer */
for (;;) {
switch (*pc++) {
case add:
sp[1] += sp[0];
sp++;
break;
/* ... other instructions */
}
}
}
In a threaded interpreter, the decoding code is not centralized, but rather duplicated at the end of each piece of code that handles an instruction. This means that once an instruction has been interpreted, instead of jumping back to some centralized decoding code, the interpreter decodes the next instruction and immediately jumps to it. Implementing threaded code efficiently in ANSI-C is not really possible, but GCC's "computed goto" extension works very well for that. Here is a threaded version of the previous interpreter:
void interpret() {
void* program[] = { &&l_add, /* ... */ };
int* sp = ...;
void** pc = program;
goto **pc; /* jump to first instruction */
l_add:
sp[1] += sp[0];
++sp;
goto **(++pc); /* jump to next instruction */
/* ... other instructions */
}
Apart from saving a jump, such threaded interpreters are also more efficient because the replicated indirect jump (to the next instruction) can be predicted better by modern CPUs. Anton Ertl has some interesting papers on his home page, especially the one called "The Structure and Performance of Efficient Interpreters", from which the above pieces of code were adapted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With