Here how I understand the story:
LDR r0, [pc, 0x5678]is equivalent to this "C code"
r0 = *(pc + 0x5678)It's pointer dereferencing with base offset.
And my question:
I found this code
LDR PC, [PC,-4]
It's commented like monkey patching, etc..
How I understand this code
pc = *(pc - 4)
I this case "pc" register will dereference the address of previous instruction and will contain the "machine code" of instruction (not the address of instruction), and program will jump to that invalid address to continue execution, and probably we will get "Segmentation Fault". So what I'm missing or not understanding?
The thing that makes me to think is the brackets of second operand in LDR instruction.
As I know on x86 architecture brackets are already dereferencing the pointer, but I can't understand the meaning in ARM architecture.
mov r1, 0x5678 add r1, pc mov r0, [r1]
is this code equivalent to?
LDR r0, [pc, 0x5678]
Usage. The LDR pseudo-instruction is used for two main purposes: to generate literal constants when an immediate value cannot be moved into a register because it is out of range of the MOV and MVN instructions. to load a program-relative or external address into a register.
The Program Counter (or PC) is a register inside the microprocessor that stores the memory address of the next instruction to be executed. In ARM processors, the Program Counter is a 32-bit register which is also known as R15. The processor first fetches the instruction from the address stored in the PC.
Load Register (register) calculates an address from a base register value and an offset register value, loads a word from memory, and writes it to a register. The offset register value can optionally be shifted. For information about memory accesses, see Memory accesses.
LDR (PC-relative) in 32-bit Thumb W width specifier to force LDR to generate a 32-bit instruction in Thumb code. LDR. W always generates a 32-bit instruction, even if the target could be reached using a 16-bit LDR .
Quoting from section 4.9.4 of the ARM Instruction Set document (ARM DDI 0029E):
When using
R15
as the base register you must remember it contains an address 8 bytes on from the address of the current instruction.
So that instruction will load the word located 4 bytes after the current instruction, which hopefully contains a valid address.
Thanks to a quirk of the ARM architecture, LDR PC, [PC,-4]
is a branch to the following instruction (assuming we're talking ARM, not Thumb here), thus under normal circumstances it has no effect (other than performance). The point is, by putting that instruction at the start of a function it's then really simple for the code to patch itself at runtime by rewriting the bottom 12 bits of the branching to an address stored in memory in the word immediately following the instruction. Herp derp, I got LDR
instruction to change the offset, thus redirecting that function somewhere else.ADR
and LDR
confused there - the above would be true if it were ADR
, but this case is even more straightforward.
Now that I've unconfused myself it's just a simple function call trampoline. The function address will be stored as a data word immediately following the LDR
instruction (presumably set to some initial value by the linker) and can simply be rewritten as data at runtime to redirect the branch, without needing to resort to self-modifying code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With