Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compiling high-level language to machine code

After reading some answers from the site and viewing some sources, I thought that the compiler converts high-level language (C++ as an example) to machine code directly as the computer itself doesn't need to convert it to assembly, it only converts it to assembly for the user to view the code and can have more control over the code if needed.

But this was found in one of my lecture sheets, so can I would appreciate if someone could explain further and correct me if I am wrong, or the screenshot below.

Slide

like image 482
Karim K. Avatar asked Jul 25 '14 21:07

Karim K.


People also ask

How do you convert high-level language to machine code?

A compiler is a computer program that translates a program written in a high-level language to the machine language of a computer. The compiler is used to translate source code into machine code or compiled code.

How is a high level programming language compiled?

There are two types of compilation: Machine code generation. Some compilers compile source code directly into machine code. This is the original mode of compilation, and languages that are directly and completely transformed to machine-native code in this way may be called truly compiled languages.

Is machine code a high-level language?

In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU).

How an interpreter translates the high level code to machine code?

As the name suggests, an interpreter transforms or interprets a high-level programming code into code that can be understood by the machine (machine code) or into an intermediate language that can be easily executed as well. The interpreter reads each statement of code and then converts or executes it directly.


3 Answers

Your slide is mostly wrong...

There is a 1-to-1 mapping between assembly and machine code. Assembly is a textual representation of the information, and machine code is a binary representation.

Some machines however, support additional assembly instructions, but what instructions are included in the produced assembly code is still determined at compile time, not runtime. Generally speaking however, this is determined by the processor in the system (intel, amd, ti, nvidia, etc..) not the manufacturer that you purchase the whole system from.

like image 169
Bill Lynch Avatar answered Oct 16 '22 12:10

Bill Lynch


This slide is confusing bytecode with textual assembly. Assembly is a human readable version of either bytecode or machine code. Machine code is what the hardware can run directly. Bytecode is further compiled to machine code, it is low level, but generic.

Some languages use byte code which is translsted during runtime into even lower level machine code. One example of this is java, where class files will sometimes be compiled to machine code asa runtime optimization. Another is cuda, where each nvidia gpu has a different instruction set but the cuda compiler generates bytecode that the cuda driver for each gpu can then translate.

Another option is that he is talking about how intel processors translate machine code during runtime into internal microcode and then run it, this is completely invisible to software though, including the OS.

like image 28
tohava Avatar answered Oct 16 '22 14:10

tohava


The slide is badly wrong in many ways.

A greatly simplified version of what actually happens in the example given in the slide — compiling C++ — would explain that there are four phases of compilation to produce and executable from a source code file:

  1. Preprocessing
  2. Compilation “proper”
  3. Assembly
  4. Linking

In the preprocessing phase, preprocessor directives, such as #include and #define are fully expanded and comments are stripped by the preprocessor, creating “postprocessed” C++. The slide omits this entirely.

In the compilation “proper” phase, the postprocessed text from the previous phase is converted into assembly language by the compiler. It's unfortunate that we use the same term — compilation — for both the whole four-step procedure and this one step, but that's the way it is.

Contrary to the slide, assembly language statements are not “readable by the OS” nor are they converted to machine code at run-time. Rather, they are readable by the assembler, which does its job (next paragraph) at compile-time.

In the assembly phase, the assembly language statements from the previous phase are converted into object code (binary machine code instructions that the CPU understands, combined with metadata that the OS and the linker understand) by the assembler.

In the linking phase, the object code from the previous phase is linked with other object code files and common/system libraries to form an executable.

At runtime, the OS — in particular the loader — reads the executable into memory and performs run-time linking, where references to common/system libraries are resolved and those libraries are loaded into memory (if they're not already) so that your executable is able to use them.

A further error is that different brands of machine do not have their “own machine codes”. What determines what machine codes are understood by a machine is the CPU. If two machines have the same CPU (e.g. a Dell laptop and a Toshiba laptop with the same Intel i7-3610QM CPU), then they understand the same machine codes. Moreover two CPUs with the same ISA (instruction set architecture) understand the same machine codes. Also, newer CPUs are generally backward-compatible with older CPUs in the same series. For example, a newer Intel i7 CPU understands all of the instructions that an older Intel Pentium 4 understands, but not vice-versa.

Hopefully, I've struck a somewhat better balance between simplicity and correctness than the slide, above, which fails miserably.

like image 45
Emmet Avatar answered Oct 16 '22 13:10

Emmet