Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is Relocatable and Absolute Machine Code?

While studying Assemblers, I came across these terms.The idea I got is like this, in Relocatable machine code, the code is not dependent on static RAM location. The assembler specifies RAM needs for my program. The memory can be placed wherever the linker finds room for them.

Is the idea correct? If so, how is it done by the assembler?

And, what are some examples of Absolute Machine Code?

like image 416
Shakib Ahmed Avatar asked Apr 06 '14 03:04

Shakib Ahmed


1 Answers

Many/most instruction sets have pc relative addressing, meaning take the address of the program counter, which is related to the address of the instruction you are executing, and then add an offset to that and use that for accessing memory or branching or something like that. That would be what you are calling relocatable. Because no matter where that instruction is in the address space the thing you want to jump to is relative. Move the whole block of code and data to some other address and they will still be relatively the same distance apart, so the relative addressing will still work. If equal skip the next instruction works wherever those three instructions are (the if skip, the one being skipped and the one after the skip).

Absolute uses absolute addresses, jump to this exact address, read from this exact address. If equal then branch to 0x1000.

The assembler doesn't do this, the compiler and/or programmer does. Generally, eventually, compiled code will end up having absolute addressing, in particular if your code consists of separate objects that are linked together. At compile time the compiler cant know where the object will end up nor is it possible to know where the external references are or how far away so it cant generally assume they will be close enough for pc relative addressing (which generally has a range limit). So the compilers often generate a placeholder for the linker to fill in with an absolute address. It does depend on the operation and instruction set and some other factors as to how this external address problem is solved. Eventually though based on project size, the linker will end up with some absolute addressing. So the non-default is usually a command line option to generate position independent code -PIC for example might be something your compiler supports. both the compiler and linker then have to do extra work to make those items position independent. An assembly language programmer has to do this all themselves, the assembler generally doesnt get involved in this it just creates the machine code for the instructions you tell it to generate.

novectors.s:

.globl _start
_start:
    b   reset
reset:
    mov sp,#0xD8000000
    bl notmain
    ldr r0,=notmain
    blx r0
hang: b hang

.globl dummy
dummy:
    bx lr

hello.c

extern void dummy ( unsigned int );
int notmain ( void )
{
    unsigned int ra;
    for(ra=0;ra<1000;ra++) dummy(ra);
    return(0);
}

memap (the linker script) MEMORY { ram : ORIGIN = 0xD6000000, LENGTH = 0x4000 } SECTIONS { .text : { (.text) } > ram } Makefile

ARMGNU = arm-none-eabi
COPS = -Wall -O2 -nostdlib -nostartfiles -ffreestanding 
all : hello_world.bin
clean :
    rm -f *.o
    rm -f *.bin
    rm -f *.elf
    rm -f *.list

novectors.o : novectors.s
    $(ARMGNU)-as novectors.s -o novectors.o

hello.o : hello.c
    $(ARMGNU)-gcc $(COPS) -c hello.c -o hello.o

hello_world.bin : memmap novectors.o hello.o 
    $(ARMGNU)-ld novectors.o hello.o -T memmap -o hello_world.elf
    $(ARMGNU)-objdump -D hello_world.elf > hello_world.list
    $(ARMGNU)-objcopy hello_world.elf -O binary hello_world.bin 

hello_world.list (the parts we care about)

Disassembly of section .text:

d6000000 <_start>:
d6000000:   eaffffff    b   d6000004 <reset>

d6000004 <reset>:
d6000004:   e3a0d336    mov sp, #-671088640 ; 0xd8000000
d6000008:   eb000004    bl  d6000020 <notmain>
d600000c:   e59f0008    ldr r0, [pc, #8]    ; d600001c <dummy+0x4>
d6000010:   e12fff30    blx r0

d6000014 <hang>:
d6000014:   eafffffe    b   d6000014 <hang>

d6000018 <dummy>:
d6000018:   e12fff1e    bx  lr
d600001c:   d6000020    strle   r0, [r0], -r0, lsr #32

d6000020 <notmain>:
d6000020:   e92d4010    push    {r4, lr}
d6000024:   e3a04000    mov r4, #0
d6000028:   e1a00004    mov r0, r4
d600002c:   e2844001    add r4, r4, #1
d6000030:   ebfffff8    bl  d6000018 <dummy>
d6000034:   e3540ffa    cmp r4, #1000   ; 0x3e8
d6000038:   1afffffa    bne d6000028 <notmain+0x8>
d600003c:   e3a00000    mov r0, #0
d6000040:   e8bd4010    pop {r4, lr}
d6000044:   e12fff1e    bx  lr

What I am showing here is a mixture of position independent instructions and position dependent instructions.

these two instructions for example are a shortcut to force the assembler to add a .word style memory location that the linker then has to fill in for us.

ldr r0,=notmain
blx r0

0xD600001c is that location.

    d600000c:   e59f0008    ldr r0, [pc, #8]    ; d600001c <dummy+0x4>
    d6000010:   e12fff30    blx r0
...
    d600001c:   d6000020    strle   r0, [r0], -r0, lsr #32

and it is filled in with the address 0xD6000020 which is an absolute address so for that code to work the function notmain must be at address 0xD6000020 it is not relocatable. but this portion of the example also demonstrates some position independent code as well, the

ldr r0, [pc, #8]

is the pc relative addressing I was talking about the way this instruction set works is at the time of execution the pc is two instructions ahead or basically in this case if the instruction is at 0xD600000c in memory then the pc will be 0xD6000014 when executing then add 8 to that as the instruction states and you get 0xD600001C. But if we moved that exact same machine code instruction to address 0x1000 AND we move all of the surrounding binary there including the thing it is reading (the 0xD6000020). basically do this:

    1000:   e59f0008    ldr r0, [pc, #8]    
    1004:   e12fff30    blx r0
...
    1010:   d6000020    

And those instructions, that machine code will still work, it doesnt have to be re-assembled or re-linked. the 0xD6000020 code sitll hast to be at that fixed address bit the ldr pc and blx dont.

Although the disassembler shows these with 0xd6... based addresses the bl and bne are also pc relative which you can find out by looking at the instruction set documentation

d6000030:   ebfffff8    bl  d6000018 <dummy>
d6000034:   e3540ffa    cmp r4, #1000   ; 0x3e8
d6000038:   1afffffa    bne d6000028 <notmain+0x8>

0xD6000030 would have a pc of 0xD6000038 when executed and 0xD6000038-0xD6000018 = 0x20 which is 8 instructions. And a negative 8 in twos complement is 0xFFF..FFFF8, you can see the bulk of that machine code ebfffff8 is ffff8, which is what is sign extended and added to the program counter to basically say branch backward 8 instrucitons. Same goes for the ffffa in 1afffffa it means if not equal then branch backward 6 instructions. Remember this instruction set (arm) assumes the pc is two instructions ahead so that back 6 means forward two then back 6 or effectively back 4.

If you remove the

d600000c:   e59f0008    ldr r0, [pc, #8]    ; d600001c <dummy+0x4>
d6000010:   e12fff30    blx r0

Then this entire program ends up being position independent, by accident if you will (I happened to have known it would happen) but not because I told the tools to do that but simply because I made everything close and didnt use any absolute addressing.

lastly when you say "wherever the linker finds room for them" if you notice in my linker script I tell the linker to put everything starting at 0xD6000000, I didnt specify any file names or functions, so if not told otherwise this linker places the items in the order they are specified on the command line. the hello.c code is second so after the linker has placed the novectors.s code, then the wherever the linker had room is right after that, the hello.c code starts at 0xD6000020.

And an easy way to see what is position independent and what isnt without having to research each instruction would be to change the linker script to put the code at some other address.

MEMORY
{
    ram : ORIGIN = 0x1000, LENGTH = 0x4000
}
SECTIONS
{
    .text : { *(.text*) } > ram
}

and see what machine code changes if any, and what doesnt.

00001000 <_start>:
    1000:   eaffffff    b   1004 <reset>

00001004 <reset>:
    1004:   e3a0d336    mov sp, #-671088640 ; 0xd8000000
    1008:   eb000004    bl  1020 <notmain>
    100c:   e59f0008    ldr r0, [pc, #8]    ; 101c <dummy+0x4>
    1010:   e12fff30    blx r0

00001014 <hang>:
    1014:   eafffffe    b   1014 <hang>

00001018 <dummy>:
    1018:   e12fff1e    bx  lr
    101c:   00001020    andeq   r1, r0, r0, lsr #32

00001020 <notmain>:
    1020:   e92d4010    push    {r4, lr}
    1024:   e3a04000    mov r4, #0
    1028:   e1a00004    mov r0, r4
    102c:   e2844001    add r4, r4, #1
    1030:   ebfffff8    bl  1018 <dummy>
    1034:   e3540ffa    cmp r4, #1000   ; 0x3e8
    1038:   1afffffa    bne 1028 <notmain+0x8>
    103c:   e3a00000    mov r0, #0
    1040:   e8bd4010    pop {r4, lr}
    1044:   e12fff1e    bx  lr
like image 168
old_timer Avatar answered Oct 30 '22 09:10

old_timer