Do programming language compilers first translate to assembly or directly to machine code?

People also ask

How does a compiler translate programming language?

A compiler takes the program code (source code) and converts the source code to a machine language module (called an object file). Another specialized program, called a linker, combines this object file with other previously compiled object files (in particular run-time modules) to create an executable file.

How is assembly code translated to machine code?

In order to convert assembly language into machine code it needs to be translated using an assembler . This converts each statement into the specific machine code needed for the hardware on which it is being run.

How does assembly language translated into machine language?

An assembler converts assembly language into machine language. A disassembler converts machine language into assembly.

gcc actually produces assembler and assembles it using the as assembler. Not all compilers do this - the MS compilers produce object code directly, though you can make them generate assembler output. Translating assembler to object code is a pretty simple process, at least compared with compilation.

Some compilers produce other high-level language code as their output - for example, cfront, the first C++ compiler produced C as its output which was then compiled by a C compiler.

Note that neither direct compilation or assembly actually produce an executable. That is done by the linker, which takes the various object code files produced by compilation/assembly, resolves all the names they contain and produces the final executable binary.

Almost all compilers, including gcc, produce assembly code because it's easier---both to produce and to debug the compiler. The major exceptions are usually just-in-time compilers or interactive compilers, whose authors don't want the performance overhead or the hassle of forking a whole process to run the assembler. Some interesting examples include

Standard ML of New Jersey, which runs interactively and compiles every expression on the fly.
The tinycc compiler, which is designed to be fast enough to compile, load, and run a C script in well under 100 milliseconds, and therefore doesn't want the overhead of calling the assembler and linker.

What these cases have in common is a desire for "instantaneous" response. Assemblers and linkers are plenty fast, but not quite good enough for interactive response. Yet.

There are also a large family of languages, such as Smalltalk, Java, and Lua, which compile to bytecode, not assembly code, but whose implementations may later translate that bytecode directly to machine code without benefit of an assembler.

(Footnote: in the early 1990s, Mary Fernandez and I wrote the New Jersey Machine Code Toolkit, for which the code is online, which generates C libraries that compiler writers can use to bypass the standard assembler and linker. Mary used it to roughly double the speed of her optimizing linker when generating a.out. If you don't write to disk, speedups are even greater...)

According to chapter 2 of Introduction to Reverse Engineering Software (by Mike Perry and Nasko Oskov), both gcc and cl.exe (the back end compiler for MSVC++) have the -S switch you can use to output the assembly that each compiler produces.

You can also run gcc in verbose mode (gcc -v) to get a list of commands that it executes to see what it's doing behind the scenes.

Compilers, in general, parse the source code into an Abstract Syntax Tree (an AST), then into some intermediate language. Only then, usually after some optimizations, they emit the target language.

About gcc, it can compile to a wide variety of targets. I don't know if for x86 it compiles to assembly first, but I did give you some insight onto compilers - and you asked for that too.

Related questions
                            
                                Algorithm for finding the smallest power of two that's greater or equal to a given value [duplicate]
                            
                                What does `rep ret` mean?
                            
                                What registers are preserved through a linux x86-64 function call
                            
                                Does it make any sense to use the LFENCE instruction on x86/x86_64 processors?
                            
                                Where to learn x64 assembly from? [closed]
                            
                                What does MOV EAX, DWORD PTR DS:[ESI] mean and what does it do?
                            
                                How to Detect the Number of Physical Processors / Cores on Windows, Mac and Linux
                            
                                "enter" vs "push ebp; mov ebp, esp; sub esp, imm" and "leave" vs "mov esp, ebp; pop ebp"
                            
                                What does the "rep stos" x86 assembly instruction sequence do?
                            
                                Why is gcc allowed to speculatively load from a struct?
                            
                                What do C and Assembler actually compile to? [closed]
                            
                                What is stack frame in assembly?
                            
                                What does ORG Assembly Instruction do?
                            
                                How to write self-modifying code in x86 assembly
                            
                                Assembly Language - How to do Modulo?
                            
                                Intel 64, rsi and rdi registers
                            
                                Fastest way to calculate a 128-bit integer modulo a 64-bit integer
                            
                                What are the ESP and the EBP registers?
                            
                                How does an assembly instruction turn into voltage changes on the CPU?
                            
                                Micro fusion and addressing modes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Do programming language compilers first translate to assembly or directly to machine code?

Tags:

gcc

assembly

compilation

compiler-construction

People also ask

Recent Activity

Donate For Us