, where I define IR as a 3-address code type representation (I realize that one can mean by it an AST representation as well).
It is my understanding that, when writing a best-practice compiler for an imperative language, code optimization happens both on the AST (probably best using a Visitor Pattern), and on the IR produced from the AST.
(a) Is that correct?
(b) Which type of optimization steps are best handled on the AST before even producing an IR? (reference to an article/a list online welcome too as long as it deals with an imperative language)
The compiler I'm working on is for Decaf (which some might know) which has a fairly deep CFG up to (single) class inheritance; I'll add features not part of it such as type coercion. It will be completely hand-coded (using no tools whatsoever). This is not homework; writing it for fun.
A code optimizing process must follow the three rules given below: The output code must not, in any way, change the meaning of the program. Optimization should increase the speed of the program and if possible, the program should demand less number of resources.
Typical interprocedural optimizations are: procedure inlining, interprocedural dead-code elimination, interprocedural constant propagation, and procedure reordering. As usual, the compiler needs to perform interprocedural analysis before its actual optimizations.
The AST is often used to generate an intermediate representation (IR), sometimes called an intermediate language, for the code generation.
Once in a while C++ code will not work when compiled with some level of optimization. It may be compiler doing optimization that breaks the code or it may be code containing undefined behavior which allows the compiler to do whatever it feels.
(a) Yes.
(b) Constant folding is one example; CSE is another; in fact almost anything to do with expression evaluation. IR-phase optimizations are more about what results from flow analysis.
IR is a form of an AST (often it is "flattened", but there are deep tree IRs as well), it may not be easy to distinguish one from another, especially if compiler is implemented as a sequence of very small rewrites from an original AST all the way down to a final IR suitable for instruction selection.
Optimisations may happen anywhere on this chain, but some representations are more suitable for a wide range of optimisations, most notably, an SSA form, used by most of the modern compilers to do nearly all the optimisations.
It's never too early to optimise (to coin a phrase). So there are optimisations performed before and during AST creation, on the AST itself, on the IR (if you have one) and on the code as it is generated. In C-like languages and those that compile to machine code, the effort goes into the later stages. In compilers targeting a VM I think there is less room for improvements at that stage.
Some early optimisations obviously work better than others. I don't know much about Decaf, but there are the obvious things like constant folding and constant expression evaluation. If you get the whole program in tree form before you have to generate any code you can find common subexpressions, do code migration, eliminate dead code/dead stores, hoist invariants, eliminate tail recursion and some kinds of strength reduction.
A lot of it depends on how hard you want to work and what your target is. You didn't say much about that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With