I am trying to understand the concept of Linux From Scratch and would like to know why there are multiple passes for building binutils
, gcc
etc.
Why do we need pass1 and pass2 separately? Why can't we build the tools in pass 1 and then use them to build gcc
, glibc
, libstdc++
, etc.
The goal is to ensure that your build is consistent, no matter which compiler you're using to compile your compiler (and thus which bugs that compiler has).
Let's say you're building gcc 4.1 with gcc 3.2 (I'm going to call that gcc 3.2 "stage-0"). The folks who did QA for gcc 4.1 didn't test it to work correctly when built with any compiler other than gcc 4.1 -- hence, the need to first build a stage-1 gcc, and then use that stage-1 to compile a stage-2 compiler, to prevent any bugs in the stage-0 compiler from impacting the final result.
Then, the default compile process for gcc uses the stage-2 compiler to build a stage-3 compiler, and compares the two binaries: Any difference between them can be used as proof of presence of a bug.
(Of course, this is only an effective mechanism to avoid unintended bugs; see the classic Ken Thompson paper Reflections on Trusting Trust for a discussion of how intended bugs can survive this kind of measure).
This goes beyond gcc into the entire toolchain because the same principles apply throughout: If you have any differences in the result between building glibc-x.y on a system running glibc-x.y and a system running glibc-x.(y-1) and you don't do an extra pass to ensure that you're building in a match for your target environment, then reproducing those bugs (and testing proposed fixes) is made far more difficult than would otherwise be the case: Nobody who doesn't have your (typically undisclosed) build environment can necessarily recreate the bug!
I know this query is a bit old, but I have something to add to the answers: a clarification of the meaning of 'bootstrap'.
The primary reason for the multi-stage build is to eliminate every vestige of the build host's programs/config/libs from the resultant software. It's not enough to have fresh software compiled. You also have to avoid any and all references to the host's libraries, the host's kernel interfaces (kernel headers), the host's pkg versions, and all other such dependencies on the host system.
Suppose you happened to be a masochist and wanted to build Debian 4 on Fedora 27 (it should be possible). Simply building the software would pull in references to 27's libraries and other things. And your resultant system would not run because those things are not available when the final system is installed.
LFS eases the process somewhat by building simple x86-to-x86 binutils and gcc cross tools in Stage 1, then installing the headers for the kernel to be used in the final system, then glibc. Stage 2 (binutils and gcc) is built using the cross tools, which guarantees that the host's programs/libs/config are not used at all. The rest of the toolchain (I call it Stage 3) is built using the tools from Stage 2. Now the final stage can be built (with a few small adjustments) with the assurance that no part of the build host will be referenced or used, and that no part of the toolchain will be referenced or used. The final stage is built using a path much like PATH=/bin:/usr/bin:/tools/bin; thus as the final tools are built, they will be used instead of those in the toolchain.
Building a toolchain is not for the impatient. It took me months to update Smoothwall Express' build system and the pkgs used, because building a toolchain is fraught with peril. I battled many dragons, balrocs, and dwarfs. I referenced LFS often to figure out how they did it. The result is an automated re-entrant build system that builds the entire distro with no references to the host system. I primarily build it on Debian 8, but it's been known to build on Gentoo, and it supposed to be able to build on itself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With