ARM: Why do I need to push/pop two registers at function calls?

Tags:

I understand that I need to push the Link Register at the beginning of a function call, and pop that value to the Program Couter before returning, so that the execution can carry one from where it was before the function call.

What I don't understand is why most people do this by adding an extra register to the push/pop. For instance:

push {ip, lr}
...
pop {ip, pc}

For instance, here's a Hello World in ARM, provided by the official ARM blog:

.syntax unified

    @ --------------------------------
    .global main
main:
    @ Stack the return address (lr) in addition to a dummy register (ip) to
    @ keep the stack 8-byte aligned.
    push    {ip, lr}

    @ Load the argument and perform the call. This is like 'printf("...")' in C.
    ldr     r0, =message
    bl      printf

    @ Exit from 'main'. This is like 'return 0' in C.
    mov     r0, #0      @ Return 0.
    @ Pop the dummy ip to reverse our alignment fix, and pop the original lr
    @ value directly into pc — the Program Counter — to return.
    pop     {ip, pc}

    @ --------------------------------
    @ Data for the printf calls. The GNU assembler's ".asciz" directive
    @ automatically adds a NULL character termination.
message:
    .asciz  "Hello, world.\n"

Question 1: what's the reason for the "dummy register" as they call it? Why not simply push{lr} and pop{pc}? They say it's to keep the stack 8-byte aligned, but ain't the stack 4-byte aligned?

Question 2: what register is "ip" (i.e., r7 or what?)

455

asked Apr 20 '13 12:04

Daniel Scocco

2 Answers

8-byte alignment is a requirement for interoperability between objects conforming AAPCS.

ARM has an advisory note on this subject:

ABI for the ARM® Architecture Advisory Note – SP must be 8-byte aligned on entry to AAPCS-conforming functions

Article mentions two reasons to use 8 byte alignment

Alignment fault or UNPREDICTABLE behavior. (Hardware / Architecture related reasons - LDRD / STRD could cause an Alignment Fault or show UNPREDICTABLE behavior on architectures other than ARMv7)
Application failure. (Compiler - Runtime assumption differences, they give va_start and va_arg as an example)

Of course this is all about public interfaces, if you are making a static executable with no additional linking you can align stack at 4 bytes.

193

answered Oct 24 '22 17:10

auselen

what's the reason for the "dummy register" as they call it? Why not simply push{lr} and pop{pc}? They say it's to keep the stack 8-byte aligned, but ain't the stack 4-byte aligned?

~~The stack only requires 4-byte alignment; but~~ if the data bus is 64 bits wide (as it is on many modern ARMs), it's more efficient to keep it at an 8-byte alignment. Then, for example, if you call a function that needs to stack two registers, that can be done in a single 64-bit write rather than two 32-bit writes.

UPDATE: Apparently it's not just for efficiency; it's a requirement of the official procedure call standard, as noted in the comments.

If you're targetting older 32-bit ARMs, then the extra stacked register might degrade performance slightly.

what register is "ip" (i.e., r7 or what?)

r12. See, for example, here for the full set of register aliases used by the procedure call standard.

answered Oct 24 '22 18:10

Mike Seymour

Related questions
                            
                                load ELF file into memory
                            
                                How to call C functions from ARM Assembly?
                            
                                Memory alignment today and 20 years ago
                            
                                Odd optimisation problem under MSVC
                            
                                Bit popcount for large buffer, with Core 2 CPU (SSSE3)
                            
                                Xcode: running ASM
                            
                                Useless allocated Stackspace?
                            
                                Why does x86 nopl instruction take an operand? [duplicate]
                            
                                Meaning of BND RET in x86
                            
                                Efficiently find least significant set bit in a large array?
                            
                                clang (LLVM) inline assembly - multiple constraints with useless spills / reloads
                            
                                Instruction Level Profiling: The Meaning of the Instruction Pointer?
                            
                                Stable raster on C64
                            
                                Optimizing a bit-manipulating algorithm in GameBoy Z80
                            
                                Is there a complete x86 assembly language reference that uses AT&T syntax? [closed]
                            
                                Fast signed 16-bit divide by 7 for 6502
                            
                                Is there a penalty when base+offset is in a different page than the base?
                            
                                MIPS load word syntax
                            
                                Inline assembly in Haskell
                            
                                MIPS assembly for a simple for loop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

ARM: Why do I need to push/pop two registers at function calls?

Tags:

assembly

abi

arm

stack-memory

cpu-registers

Daniel Scocco

People also ask

2 Answers

auselen

Mike Seymour

Recent Activity

Donate For Us