I'm doing a survey of features in preparation for a research project.
Name a mainstream language or language feature that is hard to optimize, and why the feature is or isn't worth the price paid, or instead, just debunk my theories below with anecdotal evidence. Before anyone flags this as subjective, I am asking for specific examples of languages or features, and ideas for optimization of these features, or important features that I haven't considered. Also, any references to implementations that prove my theories right or wrong.
Top on my list of hard to optimize features and my theories (some of my theories are untested and are based on thought experiments):
1) Runtime method overloading (aka multi-method dispatch or signature based dispatch). Is it hard to optimize when combined with features that allow runtime recompilation or method addition. Or is it just hard, anyway? Call site caching is a common optimization for many runtime systems, but multi-methods add additional complexity as well as making it less practical to inline methods.
2) Type morphing / variants (aka value based typing as opposed to variable based) Traditional optimizations simply cannot be applied when you don't know if the type of someting can change in a basic block. Combined with multi-methods, inlining must be done carefully if at all, and probably only for a given threshold of size of the callee. ie. it is easy to consider inlining simple property fetches (getters / setters) but inlining complex methods may result in code bloat. The other issue is I cannot just assign a variant to a register and JIT it to the native instructions because I have to carry around the type info, or every variable needs 2 registers instead of 1. On IA-32 this is inconvenient, even if improved with x64's extra registers. This is probably my favorite feature of dynamic languages, as it simplifies so many things from the programmer's perspective.
3) First class continuations - There are multiple ways to implement them, and I have done so in both of the most common approaches, one being stack copying and the other as implementing the runtime to use continuation passing style, cactus stacks, copy-on-write stack frames, and garbage collection. First class continuations have resource management issues, ie. we must save everything, in case the continuation is resumed, and I'm not aware if any languages support leaving a continuation with "intent" (ie. "I am not coming back here, so you may discard this copy of the world"). Having programmed in the threading model and the contination model, I know both can accomplish the same thing, but continuations' elegance imposes considerable complexity on the runtime and also may affect cache efficienty (locality of stack changes more with use of continuations and co-routines). The other issue is they just don't map to hardware. Optimizing continuations is optimizing for the less-common case, and as we know, the common case should be fast, and the less-common cases should be correct.
4) Pointer arithmetic and ability to mask pointers (storing in integers, etc.) Had to throw this in, but I could actually live without this quite easily.
My feelings are that many of the high-level features, particularly in dynamic languages just don't map to hardware. Microprocessor implementations have billions of dollars of research behind the optimizations on the chip, yet the choice of language feature(s) may marginalize many of these features (features like caching, aliasing top of stack to register, instruction parallelism, return address buffers, loop buffers and branch prediction). Macro-applications of micro-features don't necessarily pan out like some developers like to think, and implementing many languages in a VM ends up mapping native ops into function calls (ie. the more dynamic a language is the more we must lookup/cache at runtime, nothing can be assumed, so our instruction mix is made up of a higher percentage of non-local branching than traditional, statically compiled code) and the only thing we can really JIT well is expression evaluation of non-dynamic types and operations on constant or immediate types. It is my gut feeling that bytecode virtual machines and JIT cores are perhaps not always justified for certain languages because of this.
I welcome your answers.
A couple of comments:
All languages with dynamic dispatch, even based on a single object, seem pretty hard to implement efficiently. Look at all the effort that has gone into run-time optimization of Self (or more recently, JavaScript, with SpiderMonkey).
Don't overlook delimited continuations. The jury is still out, but they are significantly easier to optimize than classic undelimited continuations. Read the paper by Gasbichler and Sperber.
It is my gut feeling that bytecode virtual machines and JIT cores are perhaps not always justified for certain languages because of this.
Didn't IronPython get written to prove that a virtual machine could not do as well as the native implementation of the language (Python). And then the author of IronPython got rather a shock when they found out that IronPython performed really well for a dynamic language on a bytecode VM?
Microsoft's own .Net internals group are on record as stating that they think ultimately the JITter will outperform a "normal" compiler/linker (for say C/C++).
I think the jury is still out on this. Hard to call it either way. Choose the language that best fits the job...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With