Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange behaviour of ldr [pc, #value]

Tags:

c++

assembly

arm

I was debugging some c++ code (WinCE 6 on ARM platform), and i find some behavior strange:

    4277220C    mov         r3, #0x93, 30
    42772210    str         r3, [sp]
    42772214    ldr         r3, [pc, #0x69C]
    42772218    ldr         r2, [pc, #0x694]
    4277221C    mov         r1, #0
    42772220    ldr         r0, [pc, #0x688]

Line 42772214 ldr r3, [pc, #0x69C] is used to get some constant from .DATA section, at least I think so.

What is strange that according to the code r2 should be filled with memory from address pc=0x42772214 + 0x69C = 0x427728B0, but according to the memory contents it's loaded from 0x427728B8 (8bytes+), it happens for other ldr usages too.

Is it fault of the debugger or my understanding of ldr/pc? Another issue I don't get - why access to the .data section is relative to the executed code? I find it little bit strange.

And one more issue: i cannot find syntax of the 1st mov command (any one could point me a optype specification for the Thumb (1C2))

Sorry for the laic description, but I'm just familiarizing with the assemblies.

like image 585
XAder Avatar asked Jan 20 '10 16:01

XAder


1 Answers

This is correct. When pc is used for reading there is an 8-byte offset in ARM mode and 4-byte offset in Thumb mode.

From the ARM-ARM:

When an instruction reads the PC, the value read depends on which instruction set it comes from:

  • For an ARM instruction, the value read is the address of the instruction plus 8 bytes. Bits [1:0] of this value are always zero, because ARM instructions are always word-aligned.
  • For a Thumb instruction, the value read is the address of the instruction plus 4 bytes. Bit [0] of this value is always zero, because Thumb instructions are always halfword-aligned.

This way of reading the PC is primarily used for quick, position-independent addressing of nearby instructions and data, including position-independent branching within a program.

There are 2 reasons for pc-relative addressing.

  1. Position-independent code, which is in your case.
  2. Get some complicated constants nearby which cannot be written in 1 simple instruction, e.g. mov r3, #0x12345678 is impossible to complete in 1 instruction, so the compiler may put this constant in the end of the function and use e.g. ldr r3, [pc, #0x50] to load it instead.

I don't know what mov r3, #0x93, 30 means. Probably it is mov r3, #0x93, rol 30 (which gives 0xC0000024)?

like image 118
kennytm Avatar answered Nov 03 '22 15:11

kennytm