After researching a bit on compilers and how they work I learned that the process is often broken up into 4 steps: Preprocessor, Compiler, Assembler and Linker. The way I envisioned these steps was each being it's own separate program; A preprocessor program, a compiler program, an assembler program and a linker program. However, you learn that sometimes the process of creating assembly code and generating object files is all handled by the compiler program and sometimes its not. It seems to depend very much on the context and programming language used. My question is then how is the typical translation process broken up for translating C++ source code into machine code?
Side note: My question is different from other C++ compiler threads because I'm asking not only how a compiler works but if certain other processes, such as linking, are there own executable programs or if they are typically built into a compiler program.
Compiler ResponsibilitiesCompile large-scale topographic and planimetric models. Produce USGS maps and digital products using imagery in several areas of traditional and digital cartography.
1) Borland Turbo C Turbo C is one of the basic and popular compilers for the C programming language.
Because computer can't understand the source code directly. It will understand only object level code. Source codes are human readable format but the system cannot understand it. So, the compiler is intermediate between human readable format and machine-readable format.
The compiler checks the source code for the syntactical or structural errors, and if the source code is error-free, then it generates the object code. The c compilation process converts the source code taken as input into the object code or machine code.
All of the modern compilers (at least gcc and clang, but I doubt others are much different) have preprocessing and compiler as one executable. This is mainly because the compiler wants to be able to generate good error messages [that point to the right line and column, and when it's macros involved, can say "Called from macro FOO(x)"], and understanding "what file we're in" is easier when the compiler has the actual source-code to look at, rather than pre-processed code.
The linker is typically a separate program, and assembler is only used for inline assembly code [typically as an integrated part of the compiler] - otherwise, the compiler will generate machine-code directly without using the assembler [at least in LLVM, which is the compiler I know best]. So out of the compiler comes a fully formed object file.
If you have the correct options, the linker will be called, but is a separate executable, which will link the object file together with the runtime library and start-code "before main" (global object construction, and similar, as well as "preparing to call main"). This will produce the executable file.
With other options, the compiler will produce just an object file, or a disassembly of the machine code generated in symbolic form (the -S
option).
The backend part of the compiler, which is responsible for code-generation, also typically deals with the optimisation and various code-transformations to help the optimisation stages - for example Clang + LLVM will produce "uniform" loops, no matter if you used while
, for
or goto
to make a loop.
This helps the more advanced stages to not have to identify many different forms of loops, and allows the compiler to generate "good" code regardless of how the programmer formed the loop. [Of course, if you make it complicated enough, the compiler will probably not quite figure out how your loop works, and not optimise quite so well, but for straightforward conversion between the basic forms, it will do the same final code-generation regardless of what the source looked like].
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With