Are all programs eventually converted to assembly instructions?

Tags:

From what I understand processor circuitry varies greatly from chip to chip and therefore may require different low level instructions to execute the same high level code. Are all programs eventually converted to assembly language before becoming raw machine code or is this step no longer necessary?

If so, at what point does the processor begin to execute its own unique set of instructions? This is the lowest level of code, so is it at this point that the program instructions are executed by the processor, bit by bit?

Finally, do all architectures have/need an assembly language?

780

asked Sep 26 '13 18:09

geg

3 Answers

Assembly language is, so to say, a human-readable form of expressing the instructions a processor executes (which are binary data and very hard to manage by a human). So, if the machine instructions are not generated by a human, using assembly step is not necessary, though it sometimes does happen for convenience. If a program is compiled from a language such as C++, the compiler may generate machine code directly, without going through the intermediate stage of assembly code. Still, many compilers provide an option of generating assembly code in order to make it easier for a human to inspect what gets generated.

Many modern languages, for example Java and C# are compiled into so-called bytecode. This is code that the CPU does not execute directly, but rather an intermediate form, which may get compiled to machine code just-in-time (JIT-ted) when the program is executed. In such a case, CPU-dependent machine code gets generated but usually without going through human-readable assembly code.

191

answered Oct 30 '22 15:10

Michał Kosmulski

Assembly language is simply a human-readable, textual representation of the raw machine code. It exists for the benefit of the (human) programmers. It's not at all necessary as an intermediate step to generate machine code. Some compilers do generate assembly and then call an assembler to convert that to machine code. But since omitting that step results in faster compilation (and is not that hard to do), compilers will (broadly speaking) tend to evolve towards generating machine code directly. It is useful to have the option of compiling to assembly though, to inspect the results.

For your last question, assembly language is a human convenience, so no architecture truly needs it. You could create an architecture without one if you really wanted to. But in practice, all architectures have an assembly language. First, it's very easy to create a new assembly language: give a textual name for all your machine opcodes and registers, add some syntax to represent the different addressing modes, and you're already mostly done. And even if all code was directly converted from a higher-level language directly to machine language, you still want an assembly language if only as a way of disassembling and visualizing machine code when hunting for compiler bugs, etc.

answered Oct 30 '22 16:10

Christian Hudon

Every general purpose CPU has its own instruction set. That is, certain sequences of bytes, when executed, have a well known, documented effect on registers and memory. Assembly language is a convenient way of writing down those instructions, so that humans can read and write them and understand what they do without having to look up commands all the time. It's fairly safe to say that for every modern CPU, an assembly language exists.

Now, about whether programs are converted to assembly. Let's start by saying that CPU does not execute assembly code. It executes machine code, but there's a one-to-one correspondence between machine code commands and assembly lines. As long as you keep that distinction in mind, you can say things like "and now CPU executes a MOV, then an ADD" and so on. CPU executes machine code that corresponds to a MOV command, of course.

That said, if your language compiles to native code, your program is, indeed, converted to machine code before execution. Some compilers (not all) do that by emitting assembly sources and letting the assembler do the final step. This step, when present, is typically well hidden. The assembly representation only exists for a brief time during the compilation process, unless you tell the compiler to keep it intact.

Other compilers don't use an assembly step, but emit assembly if asked to. Microsoft C++, for example, takes an option /FA - emit assembly listing along with an object file.

If it's an interpreted language, then there's no explicit conversion to machine. The source lines are executed by the language interpreter. The bytecode oriented languages (Java, Visual Basic) live somewhere in between; they're compiled to code that is not the same as machine code, but is much easier to interpret than the high level source. For those, it's also fair to say they're not converted to machine code.

answered Oct 30 '22 15:10

Seva Alekseyev

Related questions
                            
                                Shellcode for a simple stack overflow: Exploited program with shell terminates directly after execve("/bin/sh")
                            
                                Local and static variables in C
                            
                                Why do some SSE "mov" instructions specify that they move floating-point values?
                            
                                Why do we need to define .data and .text section in assembly?
                            
                                Is it possible to call a non-exported function that resides in an exe?
                            
                                Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures?
                            
                                Difference between lea and offset
                            
                                When are GAS ELF the directives .type, .thumb, .size and .section needed?
                            
                                How does one do integer (signed or unsigned) division on ARM?
                            
                                "cpuid" before "rdtsc"
                            
                                load warning: cannot find entry symbol _start
                            
                                How to get address of base stack pointer
                            
                                What is %gs in Assembly
                            
                                LLVM and compiler nomenclature
                            
                                Assembly language (MIPS) difference betweent addi and add
                            
                                internal relocation not fixed up
                            
                                Is there an `x86` instruction to tell which core the instruction is being run on?
                            
                                What are the semantics of ADRP and ADRL instructions in ARM assembly?
                            
                                How can ARM's MOV instruction work with a large number as the second operand?
                            
                                Switch case assembly level code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Are all programs eventually converted to assembly instructions?

Tags:

cpu-architecture

assembly

compilation

geg

People also ask

3 Answers

Michał Kosmulski

Christian Hudon

Seva Alekseyev

Recent Activity

Donate For Us