Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are GCC and g++ bootstrapped?

This has been bugging me for a while. How do GCC and g++ compile themselves?

I'm guessing that every revision gets compiled with a previously built revision. Is this true? And if it is, does it mean that the oldest g++ and GCC versions were written in assembly?

like image 393
user1010005 Avatar asked Feb 24 '12 10:02

user1010005


People also ask

How was GCC bootstrapped?

Since October 2019, Guix bootstraps by using MesCC—the small C compiler that comes with Mes—to build TinyCC, which is used to build GCC 2.95.

Are GCC and G ++ same?

DIFFERENCE BETWEEN g++ & gccg++ is used to compile C++ program. gcc is used to compile C program.

What is bootstrapping in compiler construction?

In computer science, bootstrapping is the technique for producing a self-compiling compiler — that is, a compiler (or assembler) written in the source programming language that it intends to compile.

Does GCC use G ++?

g++ is a program that calls GCC and automatically specifies linking against the C++ library. It treats .


2 Answers

The oldest version of GCC was compiled using another C compiler, since there were others when it was written. The very first C compiler ever (ca. 1973, IIRC) was implemented either in PDP-11 assembly, or in the B programming language which preceded it, but in any case the B compiler was written in assembly. Similarly, the first ever C++ compiler (CPre/Cfront, 1979-1983) were probably first implemented in C, then rewritten in C++.

When you compile GCC or any other self-hosting compiler, the full order of building is:

  1. Build new version of GCC with existing C compiler
  2. re-build new version of GCC with the one you just built
  3. (optional) repeat step 2 for verification purposes.

This process is called bootstrapping. It tests the compiler's capability of compiling itself and makes sure that the resulting compiler is built with all the optimizations that it itself implements.

EDIT: Drew Dormann, in the comments, points to Bjarne Stroustrup's account of the earliest implementation of C++. It was implemented in C++ but translated by what Stroustrup calls a "preprocessor" from C++ to C; not a full compiler by his definition, but still C++ was bootstrapped in C.

like image 87
Fred Foo Avatar answered Sep 30 '22 17:09

Fred Foo


If you want to replicate the bootstrap process of GCC in a modern environment (x86 Linux), you can use the tools developed by the bootstrappable project:

  • We can start with hex0 assembler (on x86 it's 357 byte binary) which does roughly what the following two commands do

    sed 's/[;#].*$//g' hex0_x86.hex0 | xxd -r -p > hex0
    chmod +x hex0
    

    I.e. it translates ASCII equivalent of binary program into binary code, but it is written in hex0 itself.

    Basically, hex0 has equivalent source code that is in one to one correspondence to its binary code.

  • hex0 can be used to build a slighly more powerful hex1 assembler that supports a few more features (one character labels and calculates offsets). hex1 is written in hex0 assembly.

  • hex1 can be used to build hex2 (even more advanced assembler that supports multi character labels).

  • hex2 then can be used to build a macro assembler (where program using macros instead of hex opcodes).

  • You can then use thismacro assembler to build cc_x86 which is a "C compiler" written in assembly. cc_x86 only supports a small subset of C but that's an impresive start.

  • You can use cc_x86 to build M2-Planet (Macro Platform Neutral Transpiler) which is a C compiler written in C. M2-Planet is self hosting and can build itself.

  • You can then use M2-Planet to build GNU Mes which is a small scheme interpreter.

  • mes can be used to run mescc which is a C compiler written in scheme and lives in the same repository as mes.

  • mescc can be used to rebuild mes and also build mes C library.

  • Then mescc can be used to build a slighly patched Tiny C compiler.

  • Then you can use it to build newer version of TCC 0.9.27.

  • GCC 4.0.4 and musl C library can be built with TCC 0.9.27.

  • Then you can build newer GCC using older GCC. E.g. GCC 4.0.4 -> GCC 4.7.4 -> modern GCC.

TL;DR:

hex0 -> hex1 -> hex2 -> M0 -> M2-Planet -> Mes -> Mescc -> TCC -> GCC.

like image 30
Andrius Štikonas Avatar answered Sep 30 '22 15:09

Andrius Štikonas