Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pushing LR before BL instruction in ARM assembly

I'm trying to get a better understanding of why you push LR before calling a BL instruction. I understand that the BL instruction will branch to another subroutine before restoring the PC to the instruction address following the BL call but why is the LR pushed before the BL is called? I've written the entire recursive code for factorial computation below to give context. a and b are both variables that are written in pseudo.

LDR   RO, a
PUSH  (LR)
BL    factorial
STR   R0, b
POP   (LR)

factorial: 
CMP   RO, #0
MOVEQ R0, #1
MOVEQ PC, LR
MOV   R3, R0
SUB   R0, R0, #1
PUSH  (R3, LR)
BL    factorial
MUL   R0, R3, R0
POP   (R3, LR)
MOV   PC, LR

I understand how this program is supposed to flow but I'm confused about what addresses are being stored in the stack. Clearly, you want the address of the "STR R0, b" instruction to be put on the stack after your first branch call but how is that being preserved on the stack if the LR is pushed prior to the BL call?

like image 456
Philip Kwak Avatar asked Dec 05 '25 23:12

Philip Kwak


2 Answers

but why is the LR pushed before the BL is called?

Here you are seeing the cost of recursion. Recursion looks simple from a higher level coding stand point. State is stored by a compiler in a stack frame. There is only one LR register which is fine for leaf functions. However, if you have an extended call chain, "A calls B calls C calls D", then the return address of "A, B and C" must be stored when executing in "D" with the LR the return to "C". For recursion, "A, B, C, and D" are all the same.

See: ARM Link register and frame pointer for more.

I believe that it is instructive to see these extra instructions. Often a loop can be formed instead of recursion and the linear flow will execute much faster and with the same amount of variables and less code. The stack frames and manipulation are hidden to programmers in higher level languages.

It is also common that the frame is not needed due to 'tail recursion'. Really only the first call to factorial needs to save a return address and instead of bl a simple b will do.

like image 89
artless noise Avatar answered Dec 08 '25 22:12

artless noise


The link register LR is used to hold the address to which a function should return when it finishes executing. The BL instruction is essentially a 'call'; it calculates the address of the next instruction and inserts it into LR before branching. The corresponding BX LR (branch to the address held in the link register) is the 'return'.

If one function calls another, however, then before issuing a BL instruction it must save the existing value of LR somewhere, otherwise it will be overwritten and lost forever. Pushing it to the stack is the simplest way to do this.

Bear in mind that (almost) no code is actually 'stand-alone'. It's likely that any code you write is part of a function, even if it's main(), and so the link register must be preserved.

The most common pattern you'll see in compiled code is that the link register is pushed to the stack right at the top of the function, and only popped again right at the bottom. In addition it's often just popped straight into the program counter, which causes a branch without needing an explicit BX LR. So something like

.function
    ; Push working registers and LR
    PUSH {r4-r8,lr}
    ; (rest of the function goes here)
    ; Pop working registers and PC for an implicit return
    POP {r4-r8, pc}

would be typical.

like image 23
cooperised Avatar answered Dec 08 '25 22:12

cooperised