What are the benefits/drawbacks of compiling a program to machine code instead of simply constructing the AST from the source and executing operations as you traverse the tree?
Are there certain reasons you would want to do one over the other?
Interpreting an AST is normally much slower than running machine code that does the same thing. A factor of 20 is typical.
An advantage is that an AST is quicker to produce, so takes less time than most compilers do to generate code. AST interpreters also tend to be simpler than compilers because the whole code generation phase can be ignored.
So if you have a program that doesn't do heavy computation, it will be up and running faster with an interpreter. On the other hand, if you have a code that runs often or continuously in an environment where cycles are scarce, it's better off compiled.
Some programming environments (for example many lisps) include an interpreter for developing code because it supports rapid debugging cycles and a compiler for producing speedy code when development is complete. Some of these systems allow free mixing of interpreted and compiled code, which is interesting in its own right.
Compiling to bytecode is a middle ground: Quicker to compile than machine code, but faster to execute than an AST. Nonetheless, modern bytecode interpreters often compile to native code "just in time" as your program runs. This e.g. is the source of the name for Sun's HotSpot JVM. It compiles the "hot spots" in the Java bytecode to native code to speed programs during runtime.
Response to questions in comments
There was a question on the factor of 20 mentioned above. References to support this number are old because few modern language systems use pure AST interpreters. (A notable exception is command shells, but most of them were developed long ago, and speed benchmarks are not common.) They're just too slow. My context is lisp interpreters. I've implemented a couple. Here for example is one set of Scheme benchmarks. The columns corresponding to AST interpreters are pretty easy to pick out. I can post more and similiar from the ACM Digital Library archive if there is demand.
Another rough benchmark: Perl uses a heavily optimized AST interpreter. To add 10 million floats in a tight loop on my machine requires about 7 seconds. Compiled C (gcc -O1) takes about 1/20th of a second.
The commenter posed the addition of 4 variables as an example. The analysis forgot the cost of lookups. One clear dividing line between interpreter and compiler is precomputed addresses or frame offsets for symbols. In a "pure" interpreter there are none. So adding 4 numbers requires 4 lookups in the runtime environment, normally a hash table - at least 100 instructions. In good compiled code, adding 4 integers on an x86 needs 2 instructions and one more to store the result.
There are many shades between "pure" AST interpeters and compiled machine code. Depending on the language, it may be possible to compile symbol offsets into the AST. This is sometimes called "fast links". The technique typically speeds things up by a factor or 2 or more. Then there are "compile-to-bytecode and go" systems like Python, PHP, Perl, Ruby 1.9+. Their bytecode is effectively threaded code (opcodes can cause very complicated things to occur), so they are closer to ASTs than machine code. Then there are the JIT bytecode interpreters I mentioned above.
The point is that the factor of 20 pure AST interpreter is one bookend and machine code is the other. In the middle there are many variants, each with advantages and disadvantages.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With