Why does the compiler generate such code when initializing a volatile array?

Question

I have the following program that enables the alignment check (AC) bit in the x86 processor flags register in order to trap unaligned memory accesses. Then the program declares two volatile variables:

#include <assert.h>

int main(void)
{
    #ifndef NOASM
    __asm__(
        "pushf
"
        "orl $(1<<18),(%esp)
"
        "popf
"
    );
    #endif

    volatile unsigned char foo[] = { 1, 2, 3, 4, 5, 6 };
    volatile unsigned int bar = 0xaa;
    return 0;
}

If I compile this, the code generated initially does the obvious things like setting up the stack and creating the array of chars by moving the values 1, 2, 3, 4, 5, 6 onto the stack:

/tmp ➤ gcc test3.c -m32
/tmp ➤ gdb ./a.out
(gdb) disassemble main
   0x0804843d <+0>: push   %ebp
   0x0804843e <+1>: mov    %esp,%ebp
   0x08048440 <+3>: and    $0xfffffff0,%esp
   0x08048443 <+6>: sub    $0x20,%esp
   0x08048446 <+9>: mov    %gs:0x14,%eax
   0x0804844c <+15>:    mov    %eax,0x1c(%esp)
   0x08048450 <+19>:    xor    %eax,%eax
   0x08048452 <+21>:    pushf
   0x08048453 <+22>:    orl    $0x40000,(%esp)
   0x0804845a <+29>:    popf
   0x0804845b <+30>:    movb   $0x1,0x16(%esp)
   0x08048460 <+35>:    movb   $0x2,0x17(%esp)
   0x08048465 <+40>:    movb   $0x3,0x18(%esp)
   0x0804846a <+45>:    movb   $0x4,0x19(%esp)
   0x0804846f <+50>:    movb   $0x5,0x1a(%esp)
   0x08048474 <+55>:    movb   $0x6,0x1b(%esp)
   0x08048479 <+60>:    mov    0x16(%esp),%eax
   0x0804847d <+64>:    mov    %eax,0x10(%esp)
   0x08048481 <+68>:    movzwl 0x1a(%esp),%eax
   0x08048486 <+73>:    mov    %ax,0x14(%esp)
   0x0804848b <+78>:    movl   $0xaa,0xc(%esp)
   0x08048493 <+86>:    mov    $0x0,%eax
   0x08048498 <+91>:    mov    0x1c(%esp),%edx
   0x0804849c <+95>:    xor    %gs:0x14,%edx
   0x080484a3 <+102>:   je     0x80484aa <main+109>
   0x080484a5 <+104>:   call   0x8048310 <__stack_chk_fail@plt>
   0x080484aa <+109>:   leave
   0x080484ab <+110>:   ret

However at main+60 it does something strange: it moves the array of 6 bytes to another part of the stack: the data is moved one 4-byte word at a time in registers. But the bytes start at offset 0x16, which is not aligned, so the program will crash when attempting to perform the mov.

So I've two questions:

Why is the compiler emitting code to copy the array to another part of the stack? I assumed volatile would skip every optimization and always perform memory accesses. Maybe volatile vars are required to always be accessed as whole words, and so the compiler would always use temporary registers to read/write whole words?
Why does the compiler not put the char array at an aligned address if it later intends to do these mov calls? I understand that x86 is normally safe with unaligned accesses, and on modern processors it does not even carry a performance penalty; however in all other instances I see the compiler trying to avoid generating unaligned accesses, since they are considered, AFAIK, an unspecified behavior in C. My guess is that, since later it provides a properly aligned pointer for the copied array on the stack, it just does not care about alignment of the data used only for initialization in a way which is invisible to the C program?

If my hypotheses above are right, it means that I cannot expect an x86 compiler to always generate aligned accesses, even if the compiled code never attempts to perform unaligned accesses itself, and so setting the AC flag is not a practical way to detect parts of the code where unaligned accesses are performed.

EDIT: After further research I can answer most of this myself. In an attempt to make progress, I added an option in Redis to set the AC flag and otherwise run normally. I found that this approach is not viable: the process immediately crashes inside libc: __mempcpy_sse2 () at ../sysdeps/x86_64/memcpy.S:83. I assume that the whole x86 software stack simply does not really care about misalignment since it is handled very well by this architecture. Thus it is not practical to run with the AC flag set.

So the answer to question 2 above is that, like the rest of the software stack, the compiler is free to do as it pleases, and relocate things on the stack without caring about alignment, so long as the behavior is correct from the perspective of the C program.

The only question left to answer, is why with volatile, is a copy made in a different part of the stack? My best guess is that the compiler is attempting to access whole words in variables declared volatile even during initialization (imagine if this address was mapped to an I/O port), but I'm not sure.

Chris Dodd · Accepted Answer

You're compiling without optimization, so the compiler is generating straight-forward code without worrying about how inefficient it is. So it first creates the initializer { 1, 2, 3, 4, 5, 6 } in temp space on the stack, and it then copies that into the space it allocated for foo.

John Wu · Answer

The compiler populates the array in a working storage area, one byte at a time, which is not atomic. It then moves the entire array to its final resting place using an atomic MOVZ instruction (the atomicity is implicit when the target address is naturally aligned).

The write has to be atomic because the compiler must assume (due to the volatile keyword) that the array can be accessed at any time by anyone else.

Why does the compiler generate such code when initializing a volatile array?

Tags:

c

assembly

compilation

antirez

2 Answers

Chris Dodd

John Wu

Recent Activity

Donate For Us

Why does the compiler generate such code when initializing a volatile array?

Tags:

c

assembly

compilation

antirez

2 Answers

Chris Dodd

John Wu

Related questions

Recent Activity

Donate For Us