I've been looking at compiler design. I've done a one semester course on it at University and have been reading Modern Compiler Design by Grune et al, the book seems to advocate an annotated Abstract Syntax Tree as the intermediate code, and this is what we used in the course.
My question is what are the benefits of this approach versus producing some kind of stack-machine language or low level pseudo code , particularly with regard to having a compiler which can target many machines.
Is it a good idea to simply target an already existing low level representation such as LLVM and use that as the intermediate representation?
An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation.
Types of Intermediate RepresentationsFlat, tuple-based, generally three-address code (quadruples) Flat, stack-based.
Intermediate code can translate the source program into the machine program. Intermediate code is generated because the compiler can't generate machine code directly in one pass. Therefore, first, it converts the source program into intermediate code, which performs efficient generation of machine code further.
If your language is complicated enough, you'd end up having a sequence of slightly different intermediate representations any way. And it does not really matter, which representation will be your final target - llvm, C, native code, CLR, JVM, whatever. It should not affect the design and architecture of your compiler.
And, from my personal experience, the more intermediate steps you have, with transforms in between as trivial as possible, the better your compiler's architecture is.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With