Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need help to understand these ARM instructions

I was looking at the assembler output of my code and need help with below instructions.

    0x00000fe8:    e28fc000    ....    ADR      r12,{pc}+8 ; 0xff0

    0x00000fec:    e28cca08    ....    ADD      r12,r12,#8, 20 ; #0x8000

From my understanding the 1st instruction causes r12 to be loaded with {pc value} + 8 that is
"{Address of current instruction in execution (0xfe8) plus 2 instructions ahead (8)} + 8"

so is r12 after 1st instruction execution loaded with 0xff8 (0xfe8+8+8)

Also regarding the 2nd instruction -
How to calculate the value being added and stored to r12? (the comment says its 0x8000, though i am not able to understand how it got this)

like image 526
MS. Avatar asked Dec 17 '22 22:12

MS.


2 Answers

The first instruction (really a pseudo-instruction) loads a PC-relative address into R12. Since the instruction is at address 0xFE8, the expression {pc}+8 evaluates to 0xFF0. So the result of the first instruction is to load the value 0xFF0 into R12. The comment actually indicates this.

(Note that ADR isn't a real ARM instruction, the assembler replaces it with an instruction such as ADD. Also note that this expression's value is calculated at assembly time. During program execution, the PC points ahead of the current instruction, due to the processor's pipeline. How much ahead depends on the architecture (e.g. ARM7, etc.) and the operating mode (Thumb/ARM). ) I'm risking giving "too much information" here about ADR & PC-relative expressions/addressing, but it's easy to get bitten if you don't understand what's going on behind the scenes.)

The second instruction (actually reading from right to left) effectively says "take the constant 0x8, rotate it right by 20 bits (which is the same as a left shift by 12 bits, 32-20 = 12), ADD it to R12 (which currently holds 0xFF0), and store it in R12." 0x8 << 12 = 0x8000, so the 2nd instruction results in R12 holding 0x8000 + 0xFF0 = 0x8FF0.

Note: In your explanation, when you said "2 instructions ahead", don't fall into that habit, think of it as 8 bytes, not 2 instructions. The instruction says add 8 bytes, it doesn't say anything about instructions. Instructions aren't necessarily 4 bytes long (for example, in Thumb, they are 2 bytes; in Thumb2, they are 2 bytes or 4 bytes, depending on the instruction).

like image 113
Dan Avatar answered Jan 29 '23 03:01

Dan


I respectfully disagree with Dan, it IS two instructions ahead, that is how the pipeline works. The size of the instruction is either 2 bytes for thumb or 4 bytes for arm, so two instructions ahead does result in either 4 or 8 bytes. It is not an arbitrary X bytes ahead, it is two instruction fetches ahead.

Most folks will just use labels and never have to know how this works. For exception handlers IF you use thumb mode you will have to deal with it and not all versions of the ARM ARM are clear on this, some versions simply say that the return register holds address+8 when they mean address+two instructions (which means 4 or 8 depending on the mode which is indicated by the lsbit of the address), over time the ARM ARM improves but older ones have lots of bugs. Most folks wont ever need to know or worry about this two instruction ahead thing.

The main answer to your questions lies in the ARM ARM (the ARM Architectural Reference Manual), in the instruction encoding. In order to have fixed length instructions, meaning all ARM mode instructions are 32 bits, immediate values have to be quite limited. So for many instructions like the add you can only have say 8 "significant bits" and a few bits for shifting. So the number 0x1001 wouldnt work, in binary this value is 0b0001000000000001. The first and last non-zero bits (significant bits) require 13 bits of storage. but the 0x8000 in your example has only 1 significant bit so that can easily be stored and shifted in a number of ways in the instruction. For instruction sets that have variable length instructions, x86 for example, you can have complete immediates, you can load or add the value 0x12345678 because that 0x12345678 is not encoded in the main opcode itself it follows the opcode in memory and can be of varying sizes to meet the needs of the instruction set. There are pros and cons to fixed and variable length that is beyond this discussion. The point being though the ARM ARM not only includes bit field definitions but each instruction has pseudo code explaining how the different bit fields are used, including things like the pc being two fetches ahead of the currently executing instruction.

The pc relative addressing is not something you normally deal with the limited immediates you will deal with all the time, it is good to know which instructions have what immediate lengths. It gets more difficult with thumb mode than arm mode to remember which operations allow what sized immediates.

like image 42
old_timer Avatar answered Jan 29 '23 02:01

old_timer