I am looking for a brief description of the use of an assembler in producing machine code. So I know that assembly is a 1:1 translation of machine code. But I am getting confused about object code and linkers and how they place into it. I don't need a complex answer just a simple one will do fine

Both an assembler and a compiler translate source files into object files. Object files are effectively an intermediate step before the final executable output (generated by the linker). The linker takes the specified object files and libraries (which are packages of object files) and resolves relocation (or 'fixup') records. These relocation records are made when the compiler/assembler doesn't know the address of a function or variable used in the source code, and generates a reference for it by name, which can be resolved by the linker. For example, say you want a program to print a message to the screen, seperated into two source files, and you want to assemble them seperately and link them (example using Linux x86-64 syscalls) - main.asm : <pre class="prettyprint"><code>bits 64 section .text extern do_message global _start _start: call do_message mov rax, 1 int 0x80 </code></pre> message.asm : <pre class="prettyprint"><code>bits 64 section .text global do_message do_message: mov rdi, message mov rcx, dword -1 xor rax, rax repnz scasb sub rdi, message mov rax, 4 mov rbx, 1 mov rcx, message mov rdx, rdi int 0x80 ret section .data message: db "hello world",10,0 </code></pre> If you assemble these and look at the object file output of main.asm (eg, objdump -d main.o), you will notice the 'call do_message' has an address of 00 00 00 00 - which is invalid. <pre class="prettyprint"><code>0000000000000000 <_start>: 0: e8 00 00 00 00 callq 5 <_start+0x5> 5: 48 c7 c0 01 00 00 00 mov $0x1,%rax c: cd 80 int $0x80 </code></pre> But, a relocation record is made for the 4 bytes of the address : <pre class="prettyprint"><code>$ objdump -r main.o main.o: file format elf64-x86-64 RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE 0000000000000001 R_X86_64_PC32 do_message+0xfffffffffffffffc 000000000000000d R_X86_64_32 .data </code></pre> The offset is '1' and the type is 'R_X86_64_PC32' which tells the linker to resolve this reference, and put the resolved address into the specified offset. When you link the final program with 'ld -o program main.o message.o', the relocations are all resolved, and if nothing is unresolved, you are left with an executable. When we 'objdump -d' the executable, we can see the resolved address : <pre class="prettyprint"><code>00000000004000f0 <_start>: 4000f0: e8 0b 00 00 00 callq 400100 <do_message> 4000f5: 48 c7 c0 01 00 00 00 mov $0x1,%rax 4000fc: cd 80 int $0x80 </code></pre> The same kind of relocations are used for variables as well as functions. The same process happens when you link your program against multiple large libraries, such as libc - you define a function called 'main' which libc has an external reference to - then libc is started before your program, and calls your 'main' function when you run the executable.

How does an assembler work?

1 Answers

Both an assembler and a compiler translate source files into object files.

Object files are effectively an intermediate step before the final executable output (generated by the linker).

The linker takes the specified object files and libraries (which are packages of object files) and resolves relocation (or 'fixup') records.

These relocation records are made when the compiler/assembler doesn't know the address of a function or variable used in the source code, and generates a reference for it by name, which can be resolved by the linker.

For example, say you want a program to print a message to the screen, seperated into two source files, and you want to assemble them seperately and link them (example using Linux x86-64 syscalls) -

main.asm :

bits 64
section .text
extern do_message
global _start
_start:
    call do_message
    mov rax, 1
    int 0x80

message.asm :

bits 64
section .text
global do_message
do_message:
    mov rdi, message
    mov rcx, dword -1
    xor rax, rax
    repnz scasb
    sub rdi, message
    mov rax, 4
    mov rbx, 1
    mov rcx, message
    mov rdx, rdi
    int 0x80
    ret

section .data
message: db "hello world",10,0

If you assemble these and look at the object file output of main.asm (eg, objdump -d main.o), you will notice the 'call do_message' has an address of 00 00 00 00 - which is invalid.

0000000000000000 <_start>:
   0:   e8 00 00 00 00          callq  5 <_start+0x5>
   5:   48 c7 c0 01 00 00 00    mov    $0x1,%rax
   c:   cd 80                   int    $0x80

But, a relocation record is made for the 4 bytes of the address :

$ objdump -r main.o
main.o:     file format elf64-x86-64
RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE 
0000000000000001 R_X86_64_PC32     do_message+0xfffffffffffffffc
000000000000000d R_X86_64_32       .data

The offset is '1' and the type is 'R_X86_64_PC32' which tells the linker to resolve this reference, and put the resolved address into the specified offset.

When you link the final program with 'ld -o program main.o message.o', the relocations are all resolved, and if nothing is unresolved, you are left with an executable.

When we 'objdump -d' the executable, we can see the resolved address :

00000000004000f0 <_start>:
  4000f0:   e8 0b 00 00 00          callq  400100 <do_message>
  4000f5:   48 c7 c0 01 00 00 00    mov    $0x1,%rax
  4000fc:   cd 80                   int    $0x80

The same kind of relocations are used for variables as well as functions. The same process happens when you link your program against multiple large libraries, such as libc - you define a function called 'main' which libc has an external reference to - then libc is started before your program, and calls your 'main' function when you run the executable.

182

answered Oct 17 '22 06:10

matja

Related questions
                            
                                How to access array values inside class object?
                            
                                Exception object lifetime
                            
                                C++ - Performance of vector of pointer to objects, vs performance of objects
                            
                                Retrofit: Handling JSON object that dynamically changes its name
                            
                                In a loop in Python, I assign a new instance of a class to the same variable, but it keeps pointing to the old instance? [duplicate]
                            
                                JS getting value of object with key starting with a string
                            
                                type/origin of R's 'as' function
                            
                                Does anyone know why the TWEAK routine gets hit before the BUILD routine?
                            
                                OCaml: Type Checking Objects
                            
                                Do temporary objects have scope?
                            
                                How I can instruct a Qt Creator PRO file to output the *.o files and moc_* files in separate folder?
                            
                                Jquery create object
                            
                                Online tool to convert JSON to C# object format [closed]
                            
                                Convert String to Object name
                            
                                Check if an object has a user defined prototype?
                            
                                Compiling java source code to native exe
                            
                                How to access objects in Jekyll array?
                            
                                Does Object.keys(anObject) return anObject's prototype? [duplicate]
                            
                                How to split object into nested object? (Recursive way)
                            
                                Deserializing Map<Object, Object> with GSon

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does an assembler work?

Tags:

object

assembly

linker

machine-code

user673906

People also ask

1 Answers

matja

Recent Activity

Donate For Us