I am trying to do some bare-metal programming in ARM with GCC and testing on QEMU. Whenever I call into an ARM label from C, my program hangs. I have a simple example of code that shows the problem at https://gist.github.com/1654392 -- when I call activate() in that code, it hangs.
I have observed with objdump that when I do a bl from assembly to C code (as from _start) it is generating a small wrapper that switches to thumb instructions. It seems that the C code is all being generated in thumb instructions, but all my assembly is being generated in ARM (32-bit) instructions. I cannot figure out why this is or how to fix it.
In order to call an ARM mode function defined in assembly from a THUMB mode function defined in C, you need to define a symbol in assembly as a function, and the tools (Linaro gcc) will produce a blx
instruction instead of bl
.
Example:
@ Here, we suppose that this part of code is inside of .code 32
.type fn, %function
fn:
mov pc, lr
see http://github.com/dwelch67/yagbat qemu directory.
Here are a couple of examples of calling arm or thumb from arm
start_vector:
mov sp,#0x20000
;@ call an arm function from arm
bl notmain
;@ call a thumb function frm arm
ldr r0,=0xAABBAABB
bl hexstring_trampoline
;@ call a thumb function frm arm
ldr r0,=0x12341234
ldr r1,hexstring_addr
mov lr,pc
bx r1
;@ call a thumb function frm arm
ldr r0,=0x12312344
bl hexstring_trampoline
hang:
b hang
hexstring_trampoline:
ldr r1,hexstring_addr
bx r1
hexstring_addr: .word hexstring
If you look at the instruction set reference you will see that you need to use BX or BLX to switch between arm and thumb states. BLX is not as widely supported as BX.
From a definition standpoint the program counter, pc is two instructions ahead during execution of an instruction. for thumb that is 4 bytes, for arm 8 bytes. Either case two instructions. To simulate a bl which cant be used to change state, you need to load the link register with the return address, and use a bx to branch to the function changing state depending on the lsbit of the address. so the
mov lr,pc
bx r1
here:
the mov lr,pc above loads the address of here: which is our return address, bx r1 in a state independent manner calls the function. the lsbit of the lr address indicates the mode to return to and you need to always use bx to return
pre_thumb:
ldr pc,lr
thumb_capable:
bx lr
The compiler allocates a bl instruction for calling functions, the linker fills in the rest later, if it is too far of a reach then it needs a trampoline function which the linker is adding itself. Likewise if you need to change modes the bl calls a trampoline function that does that. I have modeled that in one of the above to mimic that, you can see it is a bit wasteful, hopefully my explanation of the compiler only allocating space for a bl makes that more clear, wasteful would be to always plan for a mode change and have to insert nops for the majority of the function calls in code.
The code also includes a call to arm from thumb in assembler:
.thumb
.thumb_func
.globl XPUT32
XPUT32:
push {lr}
;@ call an arm function from thumb asm
ldr r2,=PUT32
mov lr,pc
bx r2
pop {r2}
bx r2
mostly the same except you cannot pop to lr in thumb mode, you can pop to pc, but I dont think that switches modes, so you cant use it, you again need a spare register. You of course need to know the calling conventions to know what registers you can use or you can wrap another set of pushes and pops to preserve all but lr
push {r2,lr}
;@ call an arm function from thumb asm
ldr r2,=PUT32
mov lr,pc
bx r2
pop {r2}
mov lr,r2
pop {r2}
bx lr
Thumb to thumb or arm to arm you just use a bl if you can reach. ldr pc,address if you cant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With