What techniques promote efficient opcode dispatch to make a fast interpreter? Are there some techniques that only work well on modern hardware and others that don't work well anymore due to hardware advances? What trade offs must be made between ease of implementation, speed, and portability?
I'm pleased that Python's C implementation is finally moving beyond a simple switch (opcode) {...}
implementation for opcode dispatch to indirect threading as a compile time option, but I'm less pleased that it took them 20 years to get there. Maybe if we document these strategies on stackoverflow the next language will get fast faster.
There are a number of papers on different kinds of dispatch:
M. Anton Ertl and David Gregg, Optimizing Indirect Branch Prediction Accuracy in Virtual Machine Interpreters, in Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI 03), pp. 278-288, San Diego, California, June 2003.
M. Anton Ertl and David Gregg, The behaviour of efficient virtual machine interpreters on modern architectures, in Proceedings of the 7th European Conference on Parallel Computing (Europar 2001), pp. 403-412, LNCS 2150, Manchester, August 2001.
An excellent summary is provided by Yunhe Shi in his PhD thesis.
Also, someone discovered a new technique a few years ago which is valid ANSI C.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With