Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GNU Arm assembler changes ORR into MOVW

Tags:

assembly

gnu

arm

I'm assembling the following piece of assembler:

.syntax unified
.cpu cortex-m4
.thumb

.section  .text

orr r1, #12800
orr r1, #12801

Essentially, just two OR instructions. If I look at the results with objdump, I get:

bla.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <.text>:
   0:   f441 5148   orr.w   r1, r1, #12800  ; 0x3200
   4:   f243 2101   movw    r1, #12801  ; 0x3201

The second OR is silently changed into a MOVW! The assembler was run as follows: arm-none-eabi-gcc -g -Wall -c bla.s and it didn't show any warnings.

The version of as is GNU assembler version 2.29.51 (arm-none-eabi) using BFD version (GNU Tools for Arm Embedded Processors 7-2017-q4-major) 2.29.51.20171128, running on OSX.

Any idea why the second OR is changed into a MOV?

like image 261
Jeroen Avatar asked Feb 01 '18 11:02

Jeroen


2 Answers

.syntax unified
.cpu cortex-m4
.thumb

.section  .text

orr r1, #12800
orr r1, #12801

arm-none-eabi-as --version GNU assembler (GNU Binutils) 2.29.1 Copyright (C) 2017 Free Software Foundation, Inc. This program is free software; you may redistribute it under the terms of the GNU General Public License version 3 or later. This program has absolutely no warranty. This assembler was configured for a target of `arm-none-eabi'.

build

arm-none-eabi-as so.s -o so.o
arm-none-eabi-objdump -D so.o

so.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <.text>:
   0:   f441 5148   orr.w   r1, r1, #12800  ; 0x3200
   4:   f243 2101   movw    r1, #12801  ; 0x3201

Jester has the answer in a comment, you should upvote that.

2.30 was just released a couple of days ago. It also produces the same results.

Working backward the issues started between 2.27.1 and 2.28. The tc-arm.c changes for that release were related to the addition of armv8m. (Cortex-m23 and cortex-m33)

Here is the bug in gas

  /* MOV accepts both Thumb2 modified immediate (T2 encoding) and
 UINT16 (T3 encoding), MOVW only accepts UINT16.  When
 disassembling, MOV is preferred when there is no encoding
 overlap.
 NOTE: MOV is using ORR opcode under Thumb 2 mode.  */
  if (((newval >> T2_DATA_OP_SHIFT) & 0xf) == T2_OPCODE_ORR
  && ARM_CPU_HAS_FEATURE (cpu_variant, arm_ext_v6t2_v8m)
  && !((newval >> T2_SBIT_SHIFT) & 0x1)
  && value >= 0 && value <=0xffff)
{
  /* Toggle bit[25] to change encoding from T2 to T3.  */
  newval ^= 1 << 25;
  /* Clear bits[19:16].  */
  newval &= 0xfff0ffff;
  /* Encoding high 4bits imm.  Code below will encode the
     remaining low 12bits.  */
  newval |= (value & 0x0000f000) << 4;
  newimm = value & 0x00000fff;
}

The ARM documentation which is over 10 years old now without anyone indicating it is buggy with respect to these instructions.

Yes there is an unused ORR encoding that is used as a MOV encoding, this is typical, not uncommon, in instruction set design. In no way, shape, or form does this mean a MOV is an ORR. Further once the mistake was made to think a MOV was an ORR, then the other MOV encoding was chosen. I am speechless.

Even worse this has been present for almost a year in the released versions of gas. How is that possible?

Part of how it is possible is that GCC knows better it encodes this as two separate instructions.

orr r1,#0x3200
orr r1,#0x0001

So for this to have been found other than the obvious lack of a peer review in the gnu world, would have been for a human to try this. The ARM immediate encoding rules are easier to remember than the thumb rules. Folks are always struggling with immediates it is the nature of the beast for RISC instruction sets. Someone should have hit this by now and someone now has.

Trying on hardware a cortex-m7

test.s

.cpu cortex-m7
.syntax unified
.thumb

.thumb_func
.globl test1
test1:
    orr r0,#0x3200
    bx lr

.thumb_func
.globl test2
test2:
    orr r0,#0x3201
    bx lr

run and print out the results

hexstring(test1(0x0000));
hexstring(test2(0x0000));
hexstring(test1(0x00FE));
hexstring(test2(0x00FE));

gas

arm-none-eabi-as --version
GNU assembler (GNU Binutils) 2.30

result

0800005c <test1>:
 800005c:   f440 5048   orr.w   r0, r0, #12800  ; 0x3200
 8000060:   4770        bx  lr

08000062 <test2>:
 8000062:   f243 2001   movw    r0, #12801  ; 0x3201
 8000066:   4770        bx  lr

output

00003200 
00003201 
000032FE 
00003201

A MOV is a MOV not an ORR.

You have found a very nasty bug in gnu assembler, I recommend that you file this bug. Despite how obvious this bug is I am very curious to see what happens. I have filed other bugs in the past and they have made excuses rather than fixes, and left the bugs in place. Please post the link to the ticket as a comment if you choose to file this, so we can all see what they do about it.

bada43421274615d0d5f629a61a60b7daa71bc15 tc-arm.c:23596 is the correct commit and location.

like image 89
old_timer Avatar answered Sep 23 '22 20:09

old_timer


The gas team has confirmed that this is a bug, and has checked in a patch. The Bugzilla entry can be found at https://sourceware.org/bugzilla/show_bug.cgi?id=22773

like image 29
Jeroen Avatar answered Sep 19 '22 20:09

Jeroen