Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is GNU as syntax different between x86 and ARM?

I've just started learning ARM assembly and I don't understand why the GNU as syntax is not the same than for x86*.

As the directives are the same, I would have expected everything to be like x86* except the instructions themselves, but instead, I'm struggling to load the address of a string, etc. I'm starting from scratch by reading some PDF online, man 2 syscall and decompiling basic examples because I'm not sure of the value of the various Hello World I can find online.

My issues:

  • registers do not need a % sigil
  • integer constant can either have a # or $ sigil. In fact, if I compile mov r0, $0, objdump -D gives me back a mov r0, #1.

Everything assembles down to the same mov r0, #1:

        mov %r0, $1
   10080:       e3a00001        mov     r0, #1
        mov r0, $1
   10084:       e3a00001        mov     r0, #1
        mov %r0, #1
   10088:       e3a00001        mov     r0, #1
        mov r0, #1
   1008c:       e3a00001        mov     r0, #1
  • I'm unable to use the address of label directly to load a string address, so I need to use a variable for that. mov r1, $hello or ldr r1, $hello do not work. In x86_64, I would have written mov $hello, %rsi. So I'm doing what gcc does, I'm creating a word with the address of that other label.

  • I'm unable to put my constants .rodata or I get a Error: internal_relocation (type: OFFSET_IMM) not fixed up, but putting everything in .text works (this part is not related to syntax)


.section .text
hello:
        .asciz "Hello World\n"
        .set hello_len, .-hello

hello_addr:
        .word hello

.align 4
.global _start
_start:
        mov r0, $1
        ldr r1, hello_addr
        mov r2, $hello_len
        mov r7, $4
        swi $0

        mov r0, $0
        mov r7, $1
        swi $0
like image 348
Benoît Avatar asked Apr 23 '17 17:04

Benoît


1 Answers

The reason why the GNU Assembler (GAS) uses AT&T syntax for x86 assembly is for compatibility with AT&T's x86 assemblers. Instead of using a syntax based on Intel's official x86 assembly syntax, AT&T chose to create a new syntax based on their earlier 68000 and PDP-11 assemblers. When x86 support was added to the GNU compiler (GCC) it generated AT&T syntax assembly because that was the assembler they were using. When GAS was created sometime after this, the GNU assembler had to use that syntax.

However there was no version of the AT&T assembler for ARM CPUs. When the GNU project started porting GCC and GAS to ARM targets there was no reason to create their own new and incompatible syntax for ARM assembly. Instead they based the syntax used on ARM's official syntax. This means you can lookup ARM instructions in ARM's official documentation and use the syntax and operand order you see there with the GNU assembler. When writing x86 assembly in AT&T syntax you just have to know the rules and exceptions, which aren't officially documented anywhere.

The reason why you can't load an address directly into a register in ARM assembly isn't an issue of syntax. ARM CPU simply don't have a instruction that can do that. All ARM instructions are the same size, 32-bits, leaving no room to encode a 32-bit address as an immediate operand. However ARM assemblers do provide a pseudo-instruction form of LDR that can handle loading a 32-bit addresses and constants automatically: ldr r1, =hello. This will cause the assembler to store the 32-bit constant in a literal table and use a PC relative LDR instruction to load it into memory. If the constant being loaded happens to be small enough to load directly using MOV or MVN that instruction is generated instead.

The reason why you can't put the constant in .rodata is either because it's too far away to address using PC relative LDR instruction (it needs to be with in +/-4KB because that biggest displacement than can fit into a single 32-bit ARM instruction) or the object format you're using doesn't support PC relative addressing to a different section. (Your ldr r1, hello_addr instruction uses PC relative addressing as there's no way to encode a 32-bit address in an ARM instruction.)

like image 141
Ross Ridge Avatar answered Oct 05 '22 23:10

Ross Ridge