What are the key design choices to make a wicked fast compiler?

Tags:

I want to know how to design a compiler that compiles very, very quickly.

First, let me head off some obvious misunderstandings of my question:

I am not talking about the speed of the code produced by the compiler. There are already many resources available for learning how to optimize generated code. What I'm having trouble finding is information on making the compiler fast.
I'm also not interested in a discussion of why C++ compilers are generally slower than Java compilers (for example). I'm interested in what techniques can be used to be speed up the compiler for any given language.
I also don't want to hear about distributed compilation systems like Microsoft's Incredibuild or Unix's distcc. Those systems don't give you faster compilers, they just give you more compilers. Which is certainly useful, but that's not the question I'm asking. I want to know how to design a fast compiler for a single CPU.
Nor is ccache the answer I'm looking for. That's a system that allows you to avoid using the compiler at all, but it doesn't make the compiler faster. Again, that's useful; again, that's not the question I'm asking.

I hope my question is now crystal clear. But perhaps some history will make it even clearer.

C compilers used to be really slow. Then, in 1986, THINK Technologies introduced Lightspeed C for Macintosh, and it compiled programs almost instantaneously. Lightspeed C was so much faster than all the other C compilers that there was hardly any comparison. (Perhaps Lightspeed C wasn't the first of the new generation of lightning-fast compilers, but it was the first in my experience. Turbo Pascal came earlier [1983] but I had no experience with it, so I don't know how it compared, speed-wise.)

Since then, many fast compilers have been available. It seems that there was some kind of quantum leap in compiler technology in the 1980's, and that in particular is what I'm trying to understand. What was the breakthrough?

The answer may be this simple: with IDE's like Lightspeed and Turbo, the integrated editor already has the source code in RAM. If the compiler operates off that data, it eliminates disk I/O, which is the slowest part of any compiler. That's probably a very important contributor to the speed improvement, if the source code size is small relative to the memory size. (In those days, RAM sizes were much smaller, but then so were typical program sizes.)

Is that it? Or were there other important innovations involved? And have there been important improvements in compiler speed since then?

850

asked Sep 13 '10 19:09

Thom Boyer

2 Answers

Simple syntax that can be parsed in a single pass.
Simple target code. If you don't target machine code directly you can get away with many things.
Not compiling at all. If you don't need fast execution or design mostly for one off scripts, you don't need to waste time analyzing the code.
Don't, I repeat, do not try to out-smart your OS disk/cache management. Mmap the whole damn file and read it as if you read it from RAM. If you don't have a virtual memory, fast compilation is the least of your worries.
Avoid creating XML DOM like bloated data structures for AST. You don't need to animate your operator precedences. Keep pointers to the mmaped data instead of copying stuff around.
Profile your code if you want it fast. Always.

Addition:

Learn different ways to parse. If you are not extremely confident of your parser writing skills, use proven parser/lexer generator tools like antlr, lemon etc.

answered Nov 03 '22 01:11

artificialidiot

One issue is what you emit for generated code. You can put pretty much as much compiler time into optimization as you like. Straightforward generation, maybe even looking sort of dumb, will save you time. Of course, back when I used Turbo Pascal and Lightspeed C, the neat part was getting an executable conveniently, not how well optimized it was. Desktop compiler technology, at the time, was seriously behind mainframe compiler technology.

Another thing about Turbo Pascal and Lightspeed C was the integration. Particularly in the days before multitasking home computers, this was great. Unlike the first C compiler I owned (for CP/M), I didn't have to make changes in an editor, close that, do the compile, do the link, and then execute. This may have been part of what you saw: fast execution of components, without the need to type in complicated commands. I can duplicate this now by running multiple terminals on a Gnome desktop: one for vim, one to run gcc, and one to execute in.

Aside from that, reducing I/O is good. Fast lexical analysis is essentially a solved problem nowadays, but not necessarily back then. I'm not sure about parsing, last having delved into that twenty years ago, so somebody else can guide you on fast parsing and such.

answered Nov 02 '22 23:11

David Thornley

Related questions
                            
                                Java Restful Web Services (jax rs) authentication pattern
                            
                                Where to put Created date and Created by in DDD?
                            
                                Best way to maximize code reuse while avoiding implementation inheritance and maintaining internalization
                            
                                Why in Spring AOP the object are wrapped into a JDK proxy that implements interfaces?
                            
                                Why is the (GoF) Flyweight a structural (and not a creational) design pattern?
                            
                                Proper way to display pop-ups using WPF M-V-VM pattern
                            
                                What pattern/patterns work best for developing rule/decision engine [closed]
                            
                                Ignoring (serious) errors to keep the program alive?
                            
                                Patterns for Android Game Development? [closed]
                            
                                Implementing a live voting system
                            
                                MVVM pattern in java
                            
                                Callback with interfaces or function objects? [closed]
                            
                                Singleton Enum implementation with parameters
                            
                                Tracing of function calls in C
                            
                                Clean Architecture Vs MVPC/MVC/MVP [closed]
                            
                                How to design a system which sends records and retries sending them again, if an acknowledgement is not receieved? [closed]
                            
                                REST API design: one endpoint with if/else logic or two separate role based endpoints
                            
                                Game architecture and design strategy for a multiplayer card game
                            
                                Is the Common Service Locator widely used?
                            
                                What is a good pattern for interpreting Core Data validation messages and displaying them on the iPhone?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are the key design choices to make a wicked fast compiler?

Tags:

performance

design-patterns

compiler-theory

Thom Boyer

People also ask

2 Answers

artificialidiot

David Thornley

Recent Activity

Donate For Us