Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is .exe made of pure machine code only?

When any high-level programing language is compiled it gets compiled to object code, then a linker links objects codes together to make an executable file.

  1. Since object codes are basically machine code then that means that .exe is pure machine code, am I right?

  2. If this is true, is it possible if you know which machine code the .exe was made with to convert that (.exe) machine code to assembly, then to a high-level language (Source code)?

like image 832
Karim K. Avatar asked Jul 26 '14 18:07

Karim K.


People also ask

What are .exe files coded in?

Executable files contain binary machine code that has been compiled from source code. This low-level code instructs a computer's central processing unit on how to run a program. The processor interprets the machine code and tells the computer's hardware what to do.

Is .exe a binary file?

Executable, a type of binary file that contains machine code for the computer to execute. Binary code, the digital representation of text and data.


3 Answers

To answer your first question, no. An executable file typically does not contain pure machine code, but also contains various metadata to assist the operating system in locating the program's dependencies (assuming the program is using external libraries) and also to contain various static data embedded within.

Typically an executable consists of various sections (though keep in mind what an "Executable" is depends on the platform and operating system), these sections are designated for containing metadata, static data, and executable code.

To answer your second question, yes it is possible to convert your executable into assembly (at least part of your executable will contain pure machine code which has a 1:1 mapping to the relevant assembly language), however converting it to a higher level language is not very possible (Though perhaps an intelligent application could give something of a guess). Often you will find debuggers that are able to go into your EXE and show you what line and file is currently being executed - this is only possible because of additional metadata in the executable itself which maps a certain instruction offset to a certain file/line in code.

On a Linux system you can typically inspect some of this metadata with the readelf and objdump tools. Equivalents may be available for other platforms

like image 192
Mark Nunberg Avatar answered Jan 31 '23 21:01

Mark Nunberg


A standard Windows .EXE file contains mostly x86 or x86-64 assembly, but it also includes a header. It would be possible to disassemble the assembly within that file into machine code. It's incredibly hard to convert an x86 or x86-64 machine code to a higher-level language, and I don't know of any programs that do that in a fool-proof manner (IDA Disassembler or a plugin for it comes closest, but as far as I can remember, but it doesn't use compilable C code; in fact, it doesn't even use a standard assembly language to display it's pure disassembly code, meaning you can't extract the output to a file and use an assembler to create a new .EXE from it). It's hard to disassemble an .EXE in a fool-proof manner because you can't just start from an arbitrary position in the file. Opcodes can be of variable length, and can take a variable amount of arguments, so a given position could be an opcode, an argument to an opcode, data stored in the .EXE for access by other opcodes, diagnostic data injected into the .EXE, part of the header, or even entirely unused (I'm sure I'm forgetting some possibility). By following the program flow you can generally determine a large amount of what the program uses for what, but certain things can only be determined by simulating running the program from what I understand. Also of note is that you can also have .EXE files that contain almost entirely CIL code rather than assembly (used by the .NET framework and by Mono).

like image 38
Eagle-Eye Avatar answered Jan 31 '23 19:01

Eagle-Eye


this has been asked many times before.

The object files and "binary" files (exe, coff, elf, etc) are mostly machine code but generally not all machine code, there is usually some amount of information in the file to describe where to load the binary blobs as well as debug info like labels and such if you built that stuff in.

It is not completely possible to determine what language or compiler or assembler was used to create an executable or object for that matter. There may be some metadata in there to indicate this but that can be easily faked. With time and experience you may be able to pick out code sequences that are particular to a compiler or version of compiler, but that could also be someone hand coding or coincidence.

From high level language to binary there is information removed at each stage, from the original variable names to dead code elimination, optimizations, etc. So you can really not get back to what you started with if your interest is decompiling, it is a lossy process.

Now some languages binaries are not machine code for the target but some next level of interpretation, java for example, or a JIT compilers binary format, which later, runtime, is interpreted or compiled and assembled into native machine code. Even in those cases though there usually will want to be a file format that is not all machine code.

For work where you are dealing with processor booting, like microcontrollers or the boot flash on a pc or something like that where you need the prom image sometimes a pure binary file is created either because that is what a bootloader needs or a prom programmer tool needs, although over time bootloaders and prom programmers are starting to accept other file formats.

like image 24
old_timer Avatar answered Jan 31 '23 19:01

old_timer