I'm trying to learn skills useful in firmware modding (for which i don't have source code) These questions concern use of BX from thumb code to jump or call other existing thumb code. <ol> <li>How do i use BX to JUMP to existing firmware THUMB code, from my THUMB code.</li> <li>How do i use BX to CALL an existing THUMB function (must set LR first), from my THUMB code.</li> </ol> My understanding is that cpu looks at lsb bit (bit 0) and i have to make sure this is set to <code>1</code> in order to keep cpu state to "thumb state". So I guess i have to ADD 1, to set lsb bit to 1. So ...say i want to just JUMP to 0x24000 ( in the middle of some existing THUMB code) <pre class="prettyprint"><code>LDR R6, =0x24000 ADD R6, #1 @ (set lsb to 1) BX R6 </code></pre> I think this is correct ? Now say i want to CALL an existing thumb function, using BX, and i want it to return to me, so i need to set LR to where i want it to return. Lets say the function i want to call is at 0x24000 It was suggested to me to use: <pre class="prettyprint"><code>ldr r2, =0x24000 mov lr, pc bx r2 </code></pre> Here comes what i don't understand: <ol> <li>the address in R2 doesn't have lsb bit set... so won't <code>bx r2</code> switch mode to ARM mode??</li> <li>The LR.. The PC has the address of (begining of current instruction, + 4), i was told. In both Thumb and Arm, any instruction address has to be aligned (16 bit or 32 bit), so it won't have the LSB bit set to 1. Only odd numbers have lsb bit set to 1.</li> </ol> So in the code above, i'm setting LR to (PC), an address that DOESN'T have lsb bit 1 set either. So when the function i called comes to it's epilogue, and does <code>BX LR</code>, ... uhmmm.. how can that work to return to my THUMB code ? I must be missing something... Normally BL is used to call functions. The manual says BL instruction sets the LR to the next line of code... So does this mean that a (normally used) <code>BL</code> THUMB instruction, sets the LR to <code>return addr + 1</code> automatically?

Wow, thanks for calling me out on this one. I know I tried the qemu code in http://github.com/dwelch67/yagbat and thought XPUT32 which calls PUT32 in the way you describe, and it worked. But it DOES NOT appear to work. I created a number of experiments and am quite surprised, this is not what I was expecting. Now I see why the gnu linker does what it does. Sorry this is a long response but I think it very valuable. It is a confusing topic, I know I have had it wrong for years thinking the pc drags the mode bit around, but it doesn't. Before I start with the experiments below, if you are going to do this: <pre class="prettyprint"><code>LDR R6, =0x24000 ADD R6, #1 @ (set lsb to 1) BX R6 </code></pre> because you happen to know that 0x24000 is thumb code, just do this instead: <pre class="prettyprint"><code>LDR R6, =0x24001 BX R6 </code></pre> And yes, that is how you branch to thumb code from arm or thumb if you happen to know that that hardcoded address 0x24000 is a thumb instruction you <code>bx</code> with a register containing the address plus one. If you don't know the address but know the name of the address: <pre class="prettyprint"><code>ldr r6,=something bx r6 </code></pre> The nice thing about that is that something can be an arm or thumb address and the above code just works. Well it works if the linker properly knows what type of label that is arm or thumb, if that gets messed up it won't work right as you can see here: <pre class="prettyprint"><code>.thumb ping: ldr r0,=pong bx r0 .code 32 pong: ldr r0,=ping bx r0 d6008148 <ping>: d6008148: 4803 ldr r0, [pc, #12] ; (d6008158 <pong+0xc>) d600814a: 4700 bx r0 d600814c <pong>: d600814c: e59f0008 ldr r0, [pc, #8] ; d600815c <pong+0x10> d6008150: e12fff10 bx r0 d6008158: d600814c strle r8, [r0], -ip, asr #2 d600815c: d6008148 strle r8, [r0], -r8, asr #2 </code></pre> That didn't work <code>pong</code> wanted to pull a thumb address from 0xD600815C but got an arm address. This is all gnu assembler stuff btw, for other tools you may have to do something else. For gas you need to put <code>.thumb_func</code> before a label that you want declared as a thumb label (the term func implying function is misleading, don't worry about what <code>.thumb_func</code> means it just is an assembler/linker game). <pre class="prettyprint"><code>.thumb .thumb_func ping: ldr r0,=pong bx r0 .code 32 pong: ldr r0,=ping bx r0 </code></pre> and now we get what we wanted: <pre class="prettyprint"><code>d6008148 <ping>: d6008148: 4803 ldr r0, [pc, #12] ; (d6008158 <pong+0xc>) d600814a: 4700 bx r0 d600814c <pong>: d600814c: e59f0008 ldr r0, [pc, #8] ; d600815c <pong+0x10> d6008150: e12fff10 bx r0 d6008158: d600814c strle r8, [r0], -ip, asr #2 d600815c: d6008149 strle r8, [r0], -r9, asr #2 </code></pre> 0xD600815C has that <code>lsbit</code> set so that you don't have to do any work. The compiler takes care of all of this when you are doing calls to C functions for example. For assembler though you have to use that <code>.thumb_func</code> (or some other directive if there is one) to get gas to know this is a thumb label and set the <code>lsbit</code> for you. So the experiment below was done on an mpcore which is an ARM11 but I also tried <code>testthumb</code> functions 1 through 4 on an ARM7TDMI and qemu with the same results. <pre class="prettyprint"><code>.globl testarm testarm: mov r0,pc bx lr armbounce: mov r0,lr bx lr .thumb .thumb_func .globl testthumb1 testthumb1: mov r0,pc bx lr nop nop nop bounce: bx lr .thumb_func .globl testthumb2 testthumb2: mov r2,lr mov r0,pc bl bounce bx r2 nop nop nop .thumb_func .globl testthumb3 testthumb3: mov r2,lr mov lr,pc mov r0,lr bx r2 nop nop nop .thumb_func .globl testthumb4 testthumb4: push {lr} ldr r2,=armbounce mov r1,pc ;@ -4 add r1,#5 ;@ -2 mov lr,r1 ;@ +0 bx r2 ;@ +2 pop {r2} ;@ +4 bx r2 .thumb_func .globl testthumb5 testthumb5: push {lr} ldr r2,=armbounce mov lr,pc bx r2 pop {r2} bx r2 .thumb_func .globl testthumb6 testthumb6: push {lr} bl testthumb6a .thumb_func testthumb6a: mov r0,lr pop {r2} bx r2 .thumb_func .globl testthumb7 testthumb7: push {lr} bl armbounce_thumb pop {r2} bx r2 .thumb_func .globl testthumb8 testthumb8: push {lr} bl armbounce_thumb_two pop {r2} bx r2 .align 4 armbounce_thumb: ldr r1,[pc] bx r1 .word armbounce nop .align 4 armbounce_thumb_two: bx pc nop .code 32 b armbounce </code></pre> Which becomes: <pre class="prettyprint"><code>d60080b4 <testarm>: d60080b4: e1a0000f mov r0, pc d60080b8: e12fff1e bx lr d60080bc <armbounce>: d60080bc: e1a0000e mov r0, lr d60080c0: e12fff1e bx lr d60080c4 <testthumb1>: d60080c4: 4678 mov r0, pc d60080c6: 4770 bx lr d60080c8: 46c0 nop ; (mov r8, r8) d60080ca: 46c0 nop ; (mov r8, r8) d60080cc: 46c0 nop ; (mov r8, r8) d60080ce <bounce>: d60080ce: 4770 bx lr d60080d0 <testthumb2>: d60080d0: 4672 mov r2, lr d60080d2: 4678 mov r0, pc d60080d4: f7ff fffb bl d60080ce <bounce> d60080d8: 4710 bx r2 d60080da: 46c0 nop ; (mov r8, r8) d60080dc: 46c0 nop ; (mov r8, r8) d60080de: 46c0 nop ; (mov r8, r8) d60080e0 <testthumb3>: d60080e0: 4672 mov r2, lr d60080e2: 46fe mov lr, pc d60080e4: 4670 mov r0, lr d60080e6: 4710 bx r2 d60080e8: 46c0 nop ; (mov r8, r8) d60080ea: 46c0 nop ; (mov r8, r8) d60080ec: 46c0 nop ; (mov r8, r8) d60080ee <testthumb4>: d60080ee: b500 push {lr} d60080f0: 4a15 ldr r2, [pc, #84] ; (d6008148 <armbounce_thumb_two+0x8>) d60080f2: 4679 mov r1, pc d60080f4: 3105 adds r1, #5 d60080f6: 468e mov lr, r1 d60080f8: 4710 bx r2 d60080fa: bc04 pop {r2} d60080fc: 4710 bx r2 d60080fe <testthumb5>: d60080fe: b500 push {lr} d6008100: 4a11 ldr r2, [pc, #68] ; (d6008148 <armbounce_thumb_two+0x8>) d6008102: 46fe mov lr, pc d6008104: 4710 bx r2 d6008106: bc04 pop {r2} d6008108: 4710 bx r2 d600810a <testthumb6>: d600810a: b500 push {lr} d600810c: f000 f800 bl d6008110 <testthumb6a> d6008110 <testthumb6a>: d6008110: 4670 mov r0, lr d6008112: bc04 pop {r2} d6008114: 4710 bx r2 d6008116 <testthumb7>: d6008116: b500 push {lr} d6008118: f000 f80a bl d6008130 <armbounce_thumb> d600811c: bc04 pop {r2} d600811e: 4710 bx r2 d6008120 <testthumb8>: d6008120: b500 push {lr} d6008122: f000 f80d bl d6008140 <armbounce_thumb_two> d6008126: bc04 pop {r2} d6008128: 4710 bx r2 d600812a: 46c0 nop ; (mov r8, r8) d600812c: 46c0 nop ; (mov r8, r8) d600812e: 46c0 nop ; (mov r8, r8) d6008130 <armbounce_thumb>: d6008130: 4900 ldr r1, [pc, #0] ; (d6008134 <armbounce_thumb+0x4>) d6008132: 4708 bx r1 d6008134: d60080bc ; <UNDEFINED> instruction: 0xd60080bc d6008138: 46c0 nop ; (mov r8, r8) d600813a: 46c0 nop ; (mov r8, r8) d600813c: 46c0 nop ; (mov r8, r8) d600813e: 46c0 nop ; (mov r8, r8) d6008140 <armbounce_thumb_two>: d6008140: 4778 bx pc d6008142: 46c0 nop ; (mov r8, r8) d6008144: eaffffdc b d60080bc <armbounce> d6008148: d60080bc ; <UNDEFINED> instruction: 0xd60080bc d600814c: e1a00000 nop ; (mov r0, r0) </code></pre> And the results of calling and printing all of these functions: <pre class="prettyprint"><code>D60080BC testarm D60080C8 testthumb1 D60080D6 testthumb2 D60080E6 testthumb3 D60080FB testthumb4 testthumb5 crashes D6008111 testthumb6 D600811D testthumb7 D6008127 testthumb8 </code></pre> So what is all of this doing and what does it have to do with your question. This has to do with mixed mode calling from thumb mode (and also from arm which is simpler) I have been programming ARM and thumb mode at this level for many years, and somehow have had this wrong all along. I thought the program counter always held the mode in that <code>lsbit</code>, I know as you know that you want to have it set or not set when you do a bx instruction. Very early in the CPU description of the ARM processor in the ARM Architectural Reference Manual (if you are writing assembler you should already have this, if not maybe most of your questions will be answered). <pre class="prettyprint"><code>Program counter Register 15 is the Program Counter (PC). It can be used in most instructions as a pointer to the instruction which is two instructions after the instruction being executed... </code></pre> So let's check and see what that really means, does that mean in arm mode two instructions, 8 bytes ahead? And in thumb mode, two instructions ahead, or 4 bytes ahead? So <code>testarm</code> verifies that the program counter is 8 bytes ahead. Which is also two instructions. <code>testthumb1</code> verifies that the program is 4 bytes ahead, which in this case is also two instructions. <code>testthumb2</code>: <pre class="prettyprint"><code>d60080d2: 4678 mov r0, pc d60080d4: f7ff fffb bl d60080ce <bounce> d60080d8: 4710 bx r2 </code></pre> If the program counter was two "instructions" ahead we would get 0xD60080D8 but we instead get 0xD60080D6 which is four bytes ahead, and that makes a lot more sense. Arm mode 8 bytes ahead, thumb mode 4 bytes ahead, no messing with decoding instructions (or data) that are ahead of the code being executed, just add 4 or 8. <code>testthumb3</code> was a hope that <code>mov lr,pc</code> was special, it isn't. If you don't see the pattern yet, the <code>lsbit</code> of the program counter is NOT set, and I guess this makes sense for branch tables for example. So <code>mov lr,pc</code> in thumb mode does NOT set up the link register right for a return. <code>testthumb4</code> in a very painful way does take the program counter wherever this code happens to end up and based on carefully placed instructions, computes the return address, if you change that instruction sequence between <code>mov r1,pc</code> and <code>bx r2</code> you have to return the add. Now why couldn't we just do something like this: <pre class="prettyprint"><code>add r1,pc,#1 bx r2 </code></pre> With thumb instructions you can't, with thumb2 you probably could. And there appear to be some processors (armv7) that support both arm instructions and thumb/thumb2 so you might be in a situation where you would want to do that. But you wouldn't add #1 because a thumb2 add instruction, if there is one that allows upper registers and has three operands would be a 4 byte thumb 2 instruction. (you would need to add #3). So <code>testthumb5</code> is directly from the code I showed you that lead to part of this question, and it crashes. This is not how it works, sorry I mislead folks I will try to go back and patch up the SO questions I used this with. <code>testthumb6</code> is an experiment to make sure we are all not crazy. All is well the link register does indeed get the <code>lsbit</code> set so that when you <code>bx lr</code> later it knows the mode from that bit. <code>testthumb7</code>, this is derived from the ARM side trampoline that you see the linker doing when going from arm mode to thumb mode, in this case though I am going from thumb mode to arm mode. Why can't the linker do it this way? Because in thumb mode at least you have to use a low register and at this point in the game, after the code is compiled the linker has no way of knowing what register it can trash. In arm mode though the ip register, not sure what that is maybe r12, can get trashed, I guess it is reserved for the compiler to use. I know in this case that <code>r1</code> can get trashed and used it, and this works as desired. The armbounce code gets called which grabs the link register if where to return to, which is a thumb instruction (<code>lsbit set</code>) after the <code>bl armbounce_thumb</code> in the <code>testthumb7</code> function, exactly where we wanted it to be. <code>testthumb8</code> this is how the gnu linker does it when it needs to get from thumb mode to arm mode. The <code>bl</code> instruction is set to go to a trampoline. Then they do something very very tricky, and crazy looking: <pre class="prettyprint"><code>d6008140 <armbounce_thumb_two>: d6008140: 4778 bx pc d6008142: 46c0 nop ; (mov r8, r8) d6008144: eaffffdc b d60080bc <armbounce> </code></pre> A <code>bx pc</code>. We know from the experiments above that the <code>pc</code> is four bytes ahead, we also know that the <code>lsbit</code> is NOT SET. So what this is saying is branch to the ARM CODE that is four bytes after this one. The <code>nop</code> is a two byte spacer, then we have to generate an ARM instruction four bytes ahead AND ALIGNED ON A FOUR BYTE BOUNDARY, and we make that an unconditional branch to whatever place we were going, this could be a b something or a <code>ldr pc</code>,=something depending on how far you need to go. Very tricky. The original <code>bl arm_bounce_thumb_two</code> sets up the link register to return to the instruction after that <code>bl</code>. The trampoline does not modify the link register it simply performs branches. If you want to get to thumb mode from arm then do what the linker does: <pre class="prettyprint"><code>... bl myfun_from_arm ... myfun_from_arm: ldr ip,[pc] bx ip .word myfun </code></pre> Which looks like this when they do it (grabbed from a different binary not at 0xD6008xxx but at 0x0001xxxx). <pre class="prettyprint"><code> 101f8: eb00003a bl 102e8 <__testthumb1_from_arm> 000102e8 <__testthumb1_from_arm>: 102e8: e59fc000 ldr ip, [pc] ; 102f0 <__testthumb1_from_arm+0x8> 102ec: e12fff1c bx ip 102f0: 00010147 andeq r0, r1, r7, asr #2 </code></pre> So whatever this ip register is (<code>r12</code>?) they don't mind trashing it and I assume you are welcome to trash it yourself.

Using BX in Thumb code to call a Thumb function, or to jump to a Thumb instruction in another function

Tags:

gcc

arm

thumb

I'm trying to learn skills useful in firmware modding (for which i don't have source code) These questions concern use of BX from thumb code to jump or call other existing thumb code.

How do i use BX to JUMP to existing firmware THUMB code, from my THUMB code.
How do i use BX to CALL an existing THUMB function (must set LR first), from my THUMB code.

My understanding is that cpu looks at lsb bit (bit 0) and i have to make sure this is set to 1 in order to keep cpu state to "thumb state". So I guess i have to ADD 1, to set lsb bit to 1.

So ...say i want to just JUMP to 0x24000 ( in the middle of some existing THUMB code)

LDR R6, =0x24000
ADD R6, #1       @ (set lsb to 1)
BX R6

I think this is correct ?

Now say i want to CALL an existing thumb function, using BX, and i want it to return to me, so i need to set LR to where i want it to return.

Lets say the function i want to call is at 0x24000 It was suggested to me to use:

ldr r2, =0x24000
mov lr, pc
bx r2

Here comes what i don't understand:

the address in R2 doesn't have lsb bit set... so won't bx r2 switch mode to ARM mode??
The LR.. The PC has the address of (begining of current instruction, + 4), i was told. In both Thumb and Arm, any instruction address has to be aligned (16 bit or 32 bit), so it won't have the LSB bit set to 1. Only odd numbers have lsb bit set to 1.

So in the code above, i'm setting LR to (PC), an address that DOESN'T have lsb bit 1 set either. So when the function i called comes to it's epilogue, and does BX LR, ... uhmmm.. how can that work to return to my THUMB code ? I must be missing something...

Normally BL is used to call functions. The manual says BL instruction sets the LR to the next line of code... So does this mean that a (normally used) BL THUMB instruction, sets the LR to return addr + 1 automatically?

479

asked Feb 20 '12 21:02

vmanta

1 Answers

Wow, thanks for calling me out on this one. I know I tried the qemu code in http://github.com/dwelch67/yagbat and thought XPUT32 which calls PUT32 in the way you describe, and it worked. But it DOES NOT appear to work. I created a number of experiments and am quite surprised, this is not what I was expecting. Now I see why the gnu linker does what it does. Sorry this is a long response but I think it very valuable. It is a confusing topic, I know I have had it wrong for years thinking the pc drags the mode bit around, but it doesn't.

Before I start with the experiments below, if you are going to do this:

LDR R6, =0x24000
ADD R6, #1       @ (set lsb to 1)
BX R6

because you happen to know that 0x24000 is thumb code, just do this instead:

LDR R6, =0x24001
BX R6

And yes, that is how you branch to thumb code from arm or thumb if you happen to know that that hardcoded address 0x24000 is a thumb instruction you bx with a register containing the address plus one.

If you don't know the address but know the name of the address:

ldr r6,=something
bx r6

The nice thing about that is that something can be an arm or thumb address and the above code just works. Well it works if the linker properly knows what type of label that is arm or thumb, if that gets messed up it won't work right as you can see here:

.thumb
ping:
    ldr r0,=pong
    bx r0
.code 32
pong:
    ldr r0,=ping
    bx r0


d6008148 <ping>:
d6008148:   4803        ldr r0, [pc, #12]   ; (d6008158 <pong+0xc>)
d600814a:   4700        bx  r0

d600814c <pong>:
d600814c:   e59f0008    ldr r0, [pc, #8]    ; d600815c <pong+0x10>
d6008150:   e12fff10    bx  r0

d6008158:   d600814c    strle   r8, [r0], -ip, asr #2
d600815c:   d6008148    strle   r8, [r0], -r8, asr #2

That didn't work pong wanted to pull a thumb address from 0xD600815C but got an arm address.

This is all gnu assembler stuff btw, for other tools you may have to do something else. For gas you need to put .thumb_func before a label that you want declared as a thumb label (the term func implying function is misleading, don't worry about what .thumb_func means it just is an assembler/linker game).

.thumb
.thumb_func
ping:
    ldr r0,=pong
    bx r0
.code 32
pong:
    ldr r0,=ping
    bx r0

and now we get what we wanted:

d6008148 <ping>:
d6008148:   4803        ldr r0, [pc, #12]   ; (d6008158 <pong+0xc>)
d600814a:   4700        bx  r0

d600814c <pong>:
d600814c:   e59f0008    ldr r0, [pc, #8]    ; d600815c <pong+0x10>
d6008150:   e12fff10    bx  r0

d6008158:   d600814c    strle   r8, [r0], -ip, asr #2
d600815c:   d6008149    strle   r8, [r0], -r9, asr #2

0xD600815C has that lsbit set so that you don't have to do any work. The compiler takes care of all of this when you are doing calls to C functions for example. For assembler though you have to use that .thumb_func (or some other directive if there is one) to get gas to know this is a thumb label and set the lsbit for you.

So the experiment below was done on an mpcore which is an ARM11 but I also tried testthumb functions 1 through 4 on an ARM7TDMI and qemu with the same results.

.globl testarm
testarm:
    mov r0,pc
    bx lr

armbounce:
    mov r0,lr
    bx lr

.thumb
.thumb_func
.globl testthumb1
testthumb1:
    mov r0,pc
    bx lr
    nop
    nop
    nop
bounce:
    bx lr
.thumb_func
.globl testthumb2
testthumb2:
    mov r2,lr
    mov r0,pc
    bl bounce
    bx r2
    nop
    nop
    nop
.thumb_func
.globl testthumb3
testthumb3:
    mov r2,lr
    mov lr,pc
    mov r0,lr
    bx r2
    nop
    nop
    nop
.thumb_func
.globl testthumb4
testthumb4:
    push {lr}
    ldr r2,=armbounce
    mov r1,pc  ;@ -4
    add r1,#5  ;@ -2
    mov lr,r1  ;@ +0
    bx r2      ;@ +2
    pop {r2}   ;@ +4
    bx r2
.thumb_func
.globl testthumb5
testthumb5:
    push {lr}
    ldr r2,=armbounce
    mov lr,pc
    bx r2
    pop {r2}
    bx r2
.thumb_func
.globl testthumb6
testthumb6:
    push {lr}
    bl testthumb6a
.thumb_func
testthumb6a:
    mov r0,lr
    pop {r2}
    bx r2

.thumb_func
.globl testthumb7
testthumb7:
    push {lr}
    bl armbounce_thumb
    pop {r2}
    bx r2

.thumb_func
.globl testthumb8
testthumb8:
    push {lr}
    bl armbounce_thumb_two
    pop {r2}
    bx r2

.align 4
armbounce_thumb:
    ldr r1,[pc]
    bx r1
.word armbounce

nop
.align 4
armbounce_thumb_two:
    bx pc
    nop
.code 32
    b armbounce

Which becomes:

d60080b4 <testarm>:
d60080b4:   e1a0000f    mov r0, pc
d60080b8:   e12fff1e    bx  lr

d60080bc <armbounce>:
d60080bc:   e1a0000e    mov r0, lr
d60080c0:   e12fff1e    bx  lr

d60080c4 <testthumb1>:
d60080c4:   4678        mov r0, pc
d60080c6:   4770        bx  lr
d60080c8:   46c0        nop         ; (mov r8, r8)
d60080ca:   46c0        nop         ; (mov r8, r8)
d60080cc:   46c0        nop         ; (mov r8, r8)

d60080ce <bounce>:
d60080ce:   4770        bx  lr

d60080d0 <testthumb2>:
d60080d0:   4672        mov r2, lr
d60080d2:   4678        mov r0, pc
d60080d4:   f7ff fffb   bl  d60080ce <bounce>
d60080d8:   4710        bx  r2
d60080da:   46c0        nop         ; (mov r8, r8)
d60080dc:   46c0        nop         ; (mov r8, r8)
d60080de:   46c0        nop         ; (mov r8, r8)

d60080e0 <testthumb3>:
d60080e0:   4672        mov r2, lr
d60080e2:   46fe        mov lr, pc
d60080e4:   4670        mov r0, lr
d60080e6:   4710        bx  r2
d60080e8:   46c0        nop         ; (mov r8, r8)
d60080ea:   46c0        nop         ; (mov r8, r8)
d60080ec:   46c0        nop         ; (mov r8, r8)

d60080ee <testthumb4>:
d60080ee:   b500        push    {lr}
d60080f0:   4a15        ldr r2, [pc, #84]   ; (d6008148 <armbounce_thumb_two+0x8>)
d60080f2:   4679        mov r1, pc
d60080f4:   3105        adds    r1, #5
d60080f6:   468e        mov lr, r1
d60080f8:   4710        bx  r2
d60080fa:   bc04        pop {r2}
d60080fc:   4710        bx  r2

d60080fe <testthumb5>:
d60080fe:   b500        push    {lr}
d6008100:   4a11        ldr r2, [pc, #68]   ; (d6008148 <armbounce_thumb_two+0x8>)
d6008102:   46fe        mov lr, pc
d6008104:   4710        bx  r2
d6008106:   bc04        pop {r2}
d6008108:   4710        bx  r2

d600810a <testthumb6>:
d600810a:   b500        push    {lr}
d600810c:   f000 f800   bl  d6008110 <testthumb6a>

d6008110 <testthumb6a>:
d6008110:   4670        mov r0, lr
d6008112:   bc04        pop {r2}
d6008114:   4710        bx  r2

d6008116 <testthumb7>:
d6008116:   b500        push    {lr}
d6008118:   f000 f80a   bl  d6008130 <armbounce_thumb>
d600811c:   bc04        pop {r2}
d600811e:   4710        bx  r2

d6008120 <testthumb8>:
d6008120:   b500        push    {lr}
d6008122:   f000 f80d   bl  d6008140 <armbounce_thumb_two>
d6008126:   bc04        pop {r2}
d6008128:   4710        bx  r2
d600812a:   46c0        nop         ; (mov r8, r8)
d600812c:   46c0        nop         ; (mov r8, r8)
d600812e:   46c0        nop         ; (mov r8, r8)

d6008130 <armbounce_thumb>:
d6008130:   4900        ldr r1, [pc, #0]    ; (d6008134 <armbounce_thumb+0x4>)
d6008132:   4708        bx  r1
d6008134:   d60080bc            ; <UNDEFINED> instruction: 0xd60080bc
d6008138:   46c0        nop         ; (mov r8, r8)
d600813a:   46c0        nop         ; (mov r8, r8)
d600813c:   46c0        nop         ; (mov r8, r8)
d600813e:   46c0        nop         ; (mov r8, r8)

d6008140 <armbounce_thumb_two>:
d6008140:   4778        bx  pc
d6008142:   46c0        nop         ; (mov r8, r8)
d6008144:   eaffffdc    b   d60080bc <armbounce>
d6008148:   d60080bc            ; <UNDEFINED> instruction: 0xd60080bc
d600814c:   e1a00000    nop         ; (mov r0, r0)

And the results of calling and printing all of these functions:

D60080BC testarm
D60080C8 testthumb1
D60080D6 testthumb2
D60080E6 testthumb3
D60080FB testthumb4
         testthumb5 crashes
D6008111 testthumb6
D600811D testthumb7
D6008127 testthumb8

So what is all of this doing and what does it have to do with your question. This has to do with mixed mode calling from thumb mode (and also from arm which is simpler)

I have been programming ARM and thumb mode at this level for many years, and somehow have had this wrong all along. I thought the program counter always held the mode in that lsbit, I know as you know that you want to have it set or not set when you do a bx instruction.

Very early in the CPU description of the ARM processor in the ARM Architectural Reference Manual (if you are writing assembler you should already have this, if not maybe most of your questions will be answered).

Program counter Register 15 is the Program Counter (PC). It can be used in most
      instructions as a pointer to the instruction which is two instructions after 
      the instruction being executed...

So let's check and see what that really means, does that mean in arm mode two instructions, 8 bytes ahead? And in thumb mode, two instructions ahead, or 4 bytes ahead?

So testarm verifies that the program counter is 8 bytes ahead. Which is also two instructions.

testthumb1 verifies that the program is 4 bytes ahead, which in this case is also two instructions.

testthumb2:

d60080d2:   4678        mov r0, pc
d60080d4:   f7ff fffb   bl  d60080ce <bounce>
d60080d8:   4710        bx  r2

If the program counter was two "instructions" ahead we would get 0xD60080D8 but we instead get 0xD60080D6 which is four bytes ahead, and that makes a lot more sense. Arm mode 8 bytes ahead, thumb mode 4 bytes ahead, no messing with decoding instructions (or data) that are ahead of the code being executed, just add 4 or 8.

testthumb3 was a hope that mov lr,pc was special, it isn't.

If you don't see the pattern yet, the lsbit of the program counter is NOT set, and I guess this makes sense for branch tables for example. So mov lr,pc in thumb mode does NOT set up the link register right for a return.

testthumb4 in a very painful way does take the program counter wherever this code happens to end up and based on carefully placed instructions, computes the return address, if you change that instruction sequence between mov r1,pc and bx r2 you have to return the add. Now why couldn't we just do something like this:

add r1,pc,#1
bx r2

With thumb instructions you can't, with thumb2 you probably could. And there appear to be some processors (armv7) that support both arm instructions and thumb/thumb2 so you might be in a situation where you would want to do that. But you wouldn't add #1 because a thumb2 add instruction, if there is one that allows upper registers and has three operands would be a 4 byte thumb 2 instruction. (you would need to add #3).

So testthumb5 is directly from the code I showed you that lead to part of this question, and it crashes. This is not how it works, sorry I mislead folks I will try to go back and patch up the SO questions I used this with.

testthumb6 is an experiment to make sure we are all not crazy. All is well the link register does indeed get the lsbit set so that when you bx lr later it knows the mode from that bit.

testthumb7, this is derived from the ARM side trampoline that you see the linker doing when going from arm mode to thumb mode, in this case though I am going from thumb mode to arm mode. Why can't the linker do it this way? Because in thumb mode at least you have to use a low register and at this point in the game, after the code is compiled the linker has no way of knowing what register it can trash. In arm mode though the ip register, not sure what that is maybe r12, can get trashed, I guess it is reserved for the compiler to use. I know in this case that r1 can get trashed and used it, and this works as desired. The armbounce code gets called which grabs the link register if where to return to, which is a thumb instruction (lsbit set) after the bl armbounce_thumb in the testthumb7 function, exactly where we wanted it to be.

testthumb8 this is how the gnu linker does it when it needs to get from thumb mode to arm mode. The bl instruction is set to go to a trampoline. Then they do something very very tricky, and crazy looking:

d6008140 <armbounce_thumb_two>:
d6008140:   4778        bx  pc
d6008142:   46c0        nop         ; (mov r8, r8)
d6008144:   eaffffdc    b   d60080bc <armbounce>

A bx pc. We know from the experiments above that the pc is four bytes ahead, we also know that the lsbit is NOT SET. So what this is saying is branch to the ARM CODE that is four bytes after this one. The nop is a two byte spacer, then we have to generate an ARM instruction four bytes ahead AND ALIGNED ON A FOUR BYTE BOUNDARY, and we make that an unconditional branch to whatever place we were going, this could be a b something or a ldr pc,=something depending on how far you need to go. Very tricky.

The original bl arm_bounce_thumb_two sets up the link register to return to the instruction after that bl. The trampoline does not modify the link register it simply performs branches.

If you want to get to thumb mode from arm then do what the linker does:

...
bl myfun_from_arm
...


myfun_from_arm:
  ldr ip,[pc]
  bx ip
.word myfun

Which looks like this when they do it (grabbed from a different binary not at 0xD6008xxx but at 0x0001xxxx).

   101f8:   eb00003a    bl  102e8 <__testthumb1_from_arm>


000102e8 <__testthumb1_from_arm>:
   102e8:   e59fc000    ldr ip, [pc]    ; 102f0 <__testthumb1_from_arm+0x8>
   102ec:   e12fff1c    bx  ip
   102f0:   00010147    andeq   r0, r1, r7, asr #2

So whatever this ip register is (r12?) they don't mind trashing it and I assume you are welcome to trash it yourself.

answered Sep 21 '22 10:09

old_timer

Related questions
                            
                                Using homebrew, gcc and llvm with C++ 11
                            
                                Creating a DLL in GCC or Cygwin?
                            
                                Building a shared library using gcc on Linux and MinGW on Windows
                            
                                Error: command 'gcc' failed: No such file or directory
                            
                                What is the difference between assembly on mac and assembly on linux?
                            
                                gcc understand where compilation time is taken
                            
                                GCC generate Canary or not?
                            
                                GCC: mtune vs march vs mcpu
                            
                                Enabling debug symbols in shared library using GCC
                            
                                Core dump in Linux
                            
                                Use label in assembly from C
                            
                                a c++ program returns different results in two IDE
                            
                                GCC -Wuninitialized / -Wmaybe-uninitialized issues
                            
                                when I use strlcpy function in c the compilor give me an error
                            
                                Is GCC's option -O2 breaking this small program or do I have undefined behavior [duplicate]
                            
                                Preventing compiler optimizations while benchmarking
                            
                                Taking advantage of SSE and other CPU extensions
                            
                                Run a "light" preprocessor for GCC
                            
                                printf for size_t
                            
                                invalid use of incomplete type / forward declaration

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With