Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is the initial step to write a bootstrapping compiler for a new language?

Suppose you were creating a compiler for a new language, say Big-Lang .If you were going to bootstrap a compiler for Big-Lang, you would write a compiler for Big-Lang-lite, the smallest possible subset of the Big-Lang. Now , I want to know that the compiler for Big-Lang-lite can be written in Big-Lang or we will have to use some other language, if we are making a bootstrap compiler?

like image 841
Surya Avatar asked Dec 16 '22 05:12

Surya


2 Answers

Here's the usual way to boot strap.

  1. Design Language X
  2. Write a compiler for the kernel of X in a different language.
  3. Write a compiler for the kernel of X in the kernel of X
  4. Compile your compiler from 3. with the compiler from 2. (You're mostly bootstrapped)
  5. Write a full X compiler in kernel X
  6. Compile the compiler with 3.
  7. You're bootstrapped!

Technically you could skip the kernel language and just implement the full language. I don't recommend this because it generally is slower (by orders of magnitude)

like image 192
Daniel Gratzer Avatar answered Apr 16 '23 09:04

Daniel Gratzer


(very interesting question, but perhaps border-line on stackoverflow; https://softwareengineering.stackexchange.com/ is probably a better place for that question)

In addition of Jozefg's answer, I would also add that in practice the steps are rather a loop:

  1. you design a "small" language X
  2. you code a poor compiler (a primordial compiler, conceptually used only once) PC for a subset of the above X language in a different language Y; this compiler PC should be a quick-and-dirty effort, because you'll use it conceptually once and you'll be the single user of PC: you don't mind about good diagnostics (so aborting on first error is ok); you don't mind about performance; and you don't care much about the target machine (you could have PC targetting some poor C++ generated code, or whatever you like). By the way, PC could be an interpreter.

  3. you code a better compiler BC in X to compile X (here the target language T is important). A this point, you probably realize that your design of X language is lacking some features; in that case, goto step 1 (by enhancing the design of X)

  4. You also add more features into BC, notably better diagnostics, better generated code, etc. Again, you might goto step 3 (improve the code of BC) or even goto step 1 (design better X) then to step 3 (improve the code of BC, in particular to let it handle the new features of X, and later on use the new features of X)
  5. You test your work by compiling BC with BC
  6. You probably want to rewrite some code in BC by taking advantage of new features of X. Again, iterating on step 1 and 3.

Also, you could bootstrap a language by first making an interpreter of that language and then coding a compiler (to be running on that interpreter first).

You may (or not) want to work on various versions of PC and BC. In particular it could happen that the current version of BC is not compilable by BC (same or immediately previous version); then you have to temporarily play with various versions - or even add temporary hacks inside BC.

Once you have a BC able to compile itself, you can throw PC.

The whole point is that designing and implementing the language is a circular work (and by implementing your language, you understand better what features you want inside and how to implement them).

Of course, you need to keep a working version of BC. This means that e.g. you backup (or even version control) the snapshot of the "target" code of BC compiled by BC. In Ocaml it is the bytecode file bootstrap/ocamlc (and the fact that Ocaml has a portable bytecode VM helps a lot); a similar approach is taken by Scheme 48 and perhaps Chicken Scheme; in MELT (a lisp-like domain specific language I'm working on to extend and customize the GCC compiler), it is the generated C++ files melt/generated/*.cc; both Ocaml and MELT keep the "compiled compiler" under version control (and distribute it...). In MELT the primordial compiler PC was a Common-Lisp program (accepting a quite small subset of what current MELT language is), and BC is the bootstrapped MELT compiler (files melt/warmelt*.melt for the MELT source code and melt/generated/warmelt*.cc for the generated C++ code). Feel free to ask questions about MELT bootstrap on its googlegroup list. The Rust language has a slightly different approach: the bootstrap build of it fetches some executable file (of some older version of the compiler) on the Web.

Bootstrapping a language is more an art than a science. By experience, it is usually worthwhile to go in small steps. You might want to read J.Pitrat artificial beings : the conscience of a conscious machine (J.Pitrat's favorite theme is that strong artificial intelligence is a bootstrapping process: you need strong AI to implement strong AI, see also this paper). You could also read C.Queinnec Lisp in Small Pieces (if you read french, read the latest french version) explaining how Lisp-like implementations can be bootstrapped.

additional reference

You definitely should also read J.Pitrat's blog (since J.Pitrat worked half of his career on bootstrapping artificial intelligence).

like image 38
Basile Starynkevitch Avatar answered Apr 16 '23 08:04

Basile Starynkevitch