Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Learning to write a compiler [closed]

People also ask

Is it hard to write a compiler?

Writing a compiler requires knowledge of a lot of areas of computer science - regular expressions, context-free grammars, syntax trees, graphs, etc. It can help you see how to apply the theory of computer science to real-world problems.

How do you write a compiler?

If languages each have a set of grammar rules, and those rules are all the legal expressions, then there are primarily two parts to building a compiler. Be able to read a file, parse it, then build an validate an Abstract Syntax Tree from that grammar.

How long does it take to learn compiler design?

Nearly every student who takes an undergraduate (or graduate) compiler construction class implements a simple but compiler for a “toy” language in a semester (3–4 months), starting with roughly no knowledge how to do it.


Big List of Resources:

  • A Nanopass Framework for Compiler Education ¶
  • Advanced Compiler Design and Implementation $
  • An Incremental Approach to Compiler Construction ¶
  • ANTLR 3.x Video Tutorial
  • Basics of Compiler Design
  • Building a Parrot Compiler
  • Compiler Basics
  • Compiler Construction $
  • Compiler Design and Construction $
  • Crafting a Compiler with C $
  • Crafting Interpreters
  • [Compiler Design in C] 12 ¶
  • Compilers: Principles, Techniques, and Tools $ — aka "The Dragon Book"; widely considered "the book" for compiler writing.
  • Engineering a Compiler $
  • Essentials of Programming Languages
  • Flipcode Article Archive (look for "Implementing A Scripting Engine by Jan Niestadt")
  • Game Scripting Mastery $
  • How to build a virtual machine from scratch in C# ¶
  • Implementing Functional Languages
  • Implementing Programming Languages (with BNFC)
  • Implementing Programming Languages using C# 4.0
  • Interpreter pattern (described in Design Patterns $) specifies a way to evaluate sentences in a language
  • Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages $
  • Let's Build a Compiler by Jack Crenshaw — The PDF ¶ version (examples are in Pascal, but the information is generally applicable)
  • Linkers and Loaders $ (Google Books)
  • Lisp in Small Pieces (LiSP) $
  • LLVM Tutorial
  • Modern Compiler Implementation in ML $ — There is a Java $ and C $ version as well - widely considered a very good book
  • Object-Oriented Compiler Construction $
  • Parsing Techniques - A Practical Guide
  • Project Oberon ¶ - Look at chapter 13
  • Programming a Personal Computer $
  • Programing Languages: Application and Interpretation
  • Rabbit: A Compiler for Scheme¶
  • Reflections on Trusting Trust — A quick guide
  • Roll Your Own Compiler for the .NET framework — A quick tutorial from MSDN
  • Structure and Interpretation of Computer Programs
  • Types and Programming Languages
  • Want to Write a Compiler? - a quick guide
  • Writing a Compiler in Ruby Bottom Up
  • Compiling a Lisp — compile directly to x86-64

Legend:

  • ¶ Link to a PDF file
  • $ Link to a printed book

This is a pretty vague question, I think; just because of the depth of the topic involved. A compiler can be decomposed into two separate parts, however; a top-half and a bottom-one. The top-half generally takes the source language and converts it into an intermediate representation, and the bottom half takes care of the platform specific code generation.

Nonetheless, one idea for an easy way to approach this topic (the one we used in my compilers class, at least) is to build the compiler in the two pieces described above. Specifically, you'll get a good idea of the entire process by just building the top-half.

Just doing the top half lets you get the experience of writing the lexical analyzer and the parser and go to generating some "code" (that intermediate representation I mentioned). So it will take your source program and convert it to another representation and do some optimization (if you want), which is the heart of a compiler. The bottom half will then take that intermediate representation and generate the bytes needed to run the program on a specific architecture. For example, the the bottom half will take your intermediate representation and generate a PE executable.

Some books on this topic that I found particularly helpful was Compilers Principles and Techniques (or the Dragon Book, due to the cute dragon on the cover). It's got some great theory and definitely covers Context-Free Grammars in a really accessible manner. Also, for building the lexical analyzer and parser, you'll probably use the *nix tools lex and yacc. And uninterestingly enough, the book called "lex and yacc" picked up where the Dragon Book left off for this part.


I think Modern Compiler Implementation in ML is the best introductory compiler writing text. There's a Java version and a C version too, either of which might be more accessible given your languages background. The book packs a lot of useful basic material (scanning and parsing, semantic analysis, activation records, instruction selection, RISC and x86 native code generation) and various "advanced" topics (compiling OO and functional languages, polymorphism, garbage collection, optimization and single static assignment form) into relatively little space (~500 pages).

I prefer Modern Compiler Implementation to the Dragon book because Modern Compiler implementation surveys less of the field--instead it has really solid coverage of all the topics you would need to write a serious, decent compiler. After you work through this book you'll be ready to tackle research papers directly for more depth if you need it.

I must confess I have a serious soft spot for Niklaus Wirth's Compiler Construction. It is available online as a PDF. I find Wirth's programming aesthetic simply beautiful, however some people find his style too minimal (for example Wirth favors recursive descent parsers, but most CS courses focus on parser generator tools; Wirth's language designs are fairly conservative.) Compiler Construction is a very succinct distillation of Wirth's basic ideas, so whether you like his style or not or not, I highly recommend reading this book.


I concur with the Dragon Book reference; IMO, it is the definitive guide to compiler construction. Get ready for some hardcore theory, though.

If you want a book that is lighter on theory, Game Scripting Mastery might be a better book for you. If you are a total newbie at compiler theory, it provides a gentler introduction. It doesn't cover more practical parsing methods (opting for non-predictive recursive descent without discussing LL or LR parsing), and as I recall, it doesn't even discuss any sort of optimization theory. Plus, instead of compiling to machine code, it compiles to a bytecode that is supposed to run on a VM that you also write.

It's still a decent read, particularly if you can pick it up for cheap on Amazon. If you only want an easy introduction into compilers, Game Scripting Mastery is not a bad way to go. If you want to go hardcore up front, then you should settle for nothing less than the Dragon Book.