GCC generates different code depending on array index value

Question

This code (arm):

void blinkRed(void)
{
    for(;;)
    {
        bb[0x0008646B] ^= 1;
        sys.Delay_ms(14);
    }
}

...is compiled to folowing asm-code:

08000470:   ldr r4, [pc, #20]       ; (0x8000488 <blinkRed()+24>) // r4 = 0x422191ac
08000472:   ldr r6, [pc, #24]       ; (0x800048c <blinkRed()+28>)
08000474:   movs r5, #14
08000476:   ldr r3, [r4, #0]
08000478:   eor.w r3, r3, #1
0800047c:   str r3, [r4, #0]
0800047e:   mov r0, r6
08000480:   mov r1, r5
08000482:   bl 0x80001ac <CSTM32F100C6::Delay_ms(unsigned int)>
08000486:   b.n 0x8000476 <blinkRed()+6>

It is ok.

But, if I just change array index (-0x400)....

void blinkRed(void)
{
    for(;;)
    {
        bb[0x0008606B] ^= 1;
        sys.Delay_ms(14);
    }
}

...I've got not so optimized code:

08000470:   ldr r4, [pc, #24]       ; (0x800048c <blinkRed()+28>) // r4 = 0x42218000
08000472:   ldr r6, [pc, #28]       ; (0x8000490 <blinkRed()+32>)
08000474:   movs r5, #14
08000476:   ldr.w r3, [r4, #428]    ; 0x1ac
0800047a:   eor.w r3, r3, #1
0800047e:   str.w r3, [r4, #428]    ; 0x1ac
08000482:   mov r0, r6
08000484:   mov r1, r5
08000486:   bl 0x80001ac <CSTM32F100C6::Delay_ms(unsigned int)>
0800048a:   b.n 0x8000476 <blinkRed()+6>

The difference is that in the first case r4 is loaded with target address immediately (0x422191ac) and then access to memory is performed with 2-byte instructions, but in the second case r4 is loaded with some intermediate address (0x42218000) and then access to memory is performed with 4-bytes instruction with offset (+0x1ac) to target address (0x422181ac).

Why compiler does so?

I use: arm-none-eabi-g++ -mcpu=cortex-m3 -mthumb -g2 -Wall -O1 -std=gnu++14 -fno-exceptions -fno-use-cxa-atexit -fstrict-volatile-bitfields -c -DSTM32F100C6T6B -DSTM32F10X_LD_VL

bb is:

__attribute__ ((section(".bitband"))) volatile u32 bb[0x00800000];

In .ld it is defined as: in MEMORY section:

BITBAND(rwx): ORIGIN = 0x42000000, LENGTH = 0x02000000

in SECTIONS section:

.bitband (NOLOAD) :
SUBALIGN(0x02000000)
{
    KEEP(*(.bitband))
} > BITBAND

FPK · Accepted Answer

I would consider it an artefact/missing optimization opportunity of -O1.

It can be understood in more detail if we look at the code generated with -O- to load bb[...]:

First case:

movw    r2, #:lower16:bb
movt    r2, #:upper16:bb
movw    r3, #37292
movt    r3, 33
adds    r3, r2, r3
ldr r3, [r3, #0]

Second case:

movw    r3, #:lower16:bb
movt    r3, #:upper16:bb
add r3, r3, #2195456       ; 0x218000    = 4*0x86000
add r3, r3, #428
ldr r3, [r3, #0]

The code in the second case is better and it can be done this way because the constant can be added with two add instructions (which is not the case if the index is 0x0008646B).

-O1 does only optimizations which are not time consuming. So apparently it merges early the add and the ldr so it misses later the opportunity to load the whole address with one pc relative ldr.

Compile with -O2 (or -fgcse) and the code looks like expected.

GCC generates different code depending on array index value

Tags:

c++

optimization

gcc

assembly

arm

Woodoo

1 Answers

FPK

Recent Activity

Donate For Us

GCC generates different code depending on array index value

Tags:

c++

optimization

gcc

assembly

arm

Woodoo

1 Answers

FPK

Related questions

Recent Activity

Donate For Us