So I am making a little toy programming language interpreter, and I would like to try and optimise the code so that the bytecode is slightly smaller. I'm not looking to do very complex optimisations such as loop hoisting, but more simple ones such as constant folding. My question is, is it better to first generate an AST, optimise that, and then convert to bytecode, or go straight to bytecode, and then try to optimise that? If anyone has any examples or know of programming languages which do either of these methods it would be greatly appreciated. Thanks in advance.

Both approaches are possible. <code>tinycc</code> for example is a C compiler that started as a toy program for the OCCC. It generates executable code directly in one pass, no AST, but still performs on the fly optimisations at the code generator level. Another example: wren is an elegant small scripting language with a direct byte code generator without an AST. It performs some optimisations on the byte code, mostly peephole optimisations. More advanced optimisations are feasible at the byte code level, and I am currently working on a good example that should be published soon, but it seems easier to construct an AST to perform a higher level analysis of the code and generate even better code. From a theoretical stand point, byte code and AST are 2 representations of the same information, but one seems more practical than the other.

AST or bytecode. Which is easier to optimise?

Tags:

c

optimization

compiler-optimization

programming-languages

interpreter

So I am making a little toy programming language interpreter, and I would like to try and optimise the code so that the bytecode is slightly smaller. I'm not looking to do very complex optimisations such as loop hoisting, but more simple ones such as constant folding.

My question is, is it better to first generate an AST, optimise that, and then convert to bytecode, or go straight to bytecode, and then try to optimise that?

If anyone has any examples or know of programming languages which do either of these methods it would be greatly appreciated.

Thanks in advance.

404

asked Aug 13 '18 17:08

dangee1705

1 Answers

Both approaches are possible. tinycc for example is a C compiler that started as a toy program for the OCCC. It generates executable code directly in one pass, no AST, but still performs on the fly optimisations at the code generator level.

Another example: wren is an elegant small scripting language with a direct byte code generator without an AST. It performs some optimisations on the byte code, mostly peephole optimisations.

More advanced optimisations are feasible at the byte code level, and I am currently working on a good example that should be published soon, but it seems easier to construct an AST to perform a higher level analysis of the code and generate even better code.

From a theoretical stand point, byte code and AST are 2 representations of the same information, but one seems more practical than the other.

199

answered Oct 05 '22 23:10

chqrlie

Related questions
                            
                                Default stack size
                            
                                Why does FreeBSD's implementation of memchr increment its pointer in its condition?
                            
                                Not a constant initializer element?
                            
                                C library version conflicts, how to manage?
                            
                                gcc -g flag: Moving the Source Code
                            
                                nfhook (netfilter) error: assignment from incompatible pointer type
                            
                                Allocate file on disk without zeroing
                            
                                Bluetooth programming in C - Secure Connection and Data transfer
                            
                                Can I retrieve the source code of the library packaged with Conan package manager to be able to debug in it?
                            
                                fgets() call with redirection get abnormal data stream
                            
                                How to concatenate strings in the arguments of _Pragma
                            
                                What kind of lexer/parser was used in the very first C compiler? [closed]
                            
                                How to get a value of __LINE__ of another function (before calling that function)?
                            
                                grep library output from within Python
                            
                                Separating hexadecimal escape sequences in strings
                            
                                How do "pinned" pages in Linux present (or actually "pin") themselves
                            
                                Is the time complexity/Big O of this function a constant?
                            
                                Can I rule out that SIGBUS is raised by a "minor page fault"? (Kernel log has no allocation failure)
                            
                                Can string literals be passed in posix_spawn's argv?
                            
                                fortran 2d-FFTW inconsistent with C 2d-FFTW results

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With