This has been bugging me for a while. How do GCC and g++ compile themselves?
I'm guessing that every revision gets compiled with a previously built revision. Is this true? And if it is, does it mean that the oldest g++ and GCC versions were written in assembly?
Since October 2019, Guix bootstraps by using MesCC—the small C compiler that comes with Mes—to build TinyCC, which is used to build GCC 2.95.
DIFFERENCE BETWEEN g++ & gccg++ is used to compile C++ program. gcc is used to compile C program.
In computer science, bootstrapping is the technique for producing a self-compiling compiler — that is, a compiler (or assembler) written in the source programming language that it intends to compile.
g++ is a program that calls GCC and automatically specifies linking against the C++ library. It treats .
The oldest version of GCC was compiled using another C compiler, since there were others when it was written. The very first C compiler ever (ca. 1973, IIRC) was implemented either in PDP-11 assembly, or in the B programming language which preceded it, but in any case the B compiler was written in assembly. Similarly, the first ever C++ compiler (CPre/Cfront, 1979-1983) were probably first implemented in C, then rewritten in C++.
When you compile GCC or any other self-hosting compiler, the full order of building is:
This process is called bootstrapping. It tests the compiler's capability of compiling itself and makes sure that the resulting compiler is built with all the optimizations that it itself implements.
EDIT: Drew Dormann, in the comments, points to Bjarne Stroustrup's account of the earliest implementation of C++. It was implemented in C++ but translated by what Stroustrup calls a "preprocessor" from C++ to C; not a full compiler by his definition, but still C++ was bootstrapped in C.
If you want to replicate the bootstrap process of GCC in a modern environment (x86 Linux), you can use the tools developed by the bootstrappable project:
We can start with hex0
assembler (on x86 it's 357 byte binary) which does
roughly what the following two commands do
sed 's/[;#].*$//g' hex0_x86.hex0 | xxd -r -p > hex0
chmod +x hex0
I.e. it translates ASCII equivalent of binary program into binary code, but it is written in hex0 itself.
Basically, hex0 has equivalent source code that is in one to one correspondence to its binary code.
hex0
can be used to build a slighly more powerful hex1
assembler that
supports a few more features (one character labels and calculates offsets).
hex1 is written in hex0 assembly.
hex1
can be used to build hex2
(even more advanced assembler that supports multi character labels).
hex2
then can be used to build a macro assembler (where program using macros instead of hex opcodes).
You can then use thismacro assembler to build cc_x86
which is a "C compiler" written in assembly. cc_x86 only supports a small subset of C but that's an impresive start.
You can use cc_x86
to build M2-Planet
(Macro Platform Neutral Transpiler) which is a C compiler written in C. M2-Planet is self hosting and can build itself.
You can then use M2-Planet to build GNU Mes which is a small scheme interpreter.
mes can be used to run mescc which is a C compiler written in scheme and lives in the same repository as mes.
mescc can be used to rebuild mes and also build mes C library.
Then mescc can be used to build a slighly patched Tiny C compiler.
Then you can use it to build newer version of TCC 0.9.27.
GCC 4.0.4 and musl C library can be built with TCC 0.9.27.
Then you can build newer GCC using older GCC. E.g. GCC 4.0.4 -> GCC 4.7.4 -> modern GCC.
TL;DR:
hex0 -> hex1 -> hex2 -> M0 -> M2-Planet -> Mes -> Mescc -> TCC -> GCC.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With