Consider the following program:
#include <stdio.h>
void some_func(char*, int*, char*);
void stack_alignment(void) {
    char a = '-';
    int i = 1337;
    char b = '+';
    some_func(&a, &i, &b); // to prevent the compiler from removing the local variables
    printf("%c|%i|%c", a, i, b);
}
It generates the following assembly (comments added by myself, I'm a complete newbie to assembly):
$ vim stack-alignment.c
$ gcc -c -S -O3 stack-alignment.c
$ cat stack-alignment.s
        .file   "stack-alignment.c"
        .section .rdata,"dr"
LC0:
        .ascii "%c|%i|%c\0"
        .text
        .p2align 2,,3
        .globl  _stack_alignment
        .def    _stack_alignment;       .scl    2;      .type   32;     .endef
_stack_alignment:
LFB7:
        .cfi_startproc
        subl    $44, %esp
        .cfi_def_cfa_offset 48
        movb    $45, 26(%esp)    // local variable 'a'
        movl    $1337, 28(%esp)  // local variable 'i'
        movb    $43, 27(%esp)    // local variable 'b'
        leal    27(%esp), %eax
        movl    %eax, 8(%esp)
        leal    28(%esp), %eax
        movl    %eax, 4(%esp)
        leal    26(%esp), %eax
        movl    %eax, (%esp)
        call    _some_func
        movsbl  27(%esp), %eax
        movl    %eax, 12(%esp)
        movl    28(%esp), %eax
        movl    %eax, 8(%esp)
        movsbl  26(%esp), %eax
        movl    %eax, 4(%esp)
        movl    $LC0, (%esp)
        call    _printf
        addl    $44, %esp
        .cfi_def_cfa_offset 4
        ret
        .cfi_endproc
LFE7:
        .def    _some_func;     .scl    2;      .type   32;     .endef
        .def    _printf;        .scl    2;      .type   32;     .endef
As you can see there are 3 local variables (a, i and b) with the sizes of 1 byte, 4 byte and 1 byte. Including the padding this would be 12 byte (assuming the compiler aligns to 4 bytes).
Wouldn't it be more memory efficient if the compiler would change the order of the variables to (a, b, i)? Then only 8 bytes would be necessary.
Here a "graphic" representation:
    3 bytes unused                  3 bytes unused
     vvvvvvvvvvv                     vvvvvvvvvvv
+---+---+---+---+---+---+---+---+---+---+---+---+
| a |   |   |   | i             | b |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+
                |
                v
+---+---+---+---+---+---+---+---+
| a | b |   |   | i             |
+---+---+---+---+---+---+---+---+
         ^^^^^^^
      2 bytes unused
Is the compiler allowed to do this optimization (by the C standard etc.)?
The stack is used for dynamic memory allocation, and local variables are stored at the top of the stack in a stack frame. A frame pointer is used to refer to local variables in the stack frame.
When a new local variables is declared, more stack memory is allocated for that function to store the variable. Such allocations make the stack grow downwards. After the function returns, the stack memory of this function is deallocated, which means all local variables become invalid.
When a function is called the local variables are stored in a stack, and it is automatically destroyed once returned. A stack is used when a variable is not used outside that function. It allows you to control how memory is allocated and deallocated. Stack automatically cleans up the object.
The C calling convention states that the parameters are pushed onto the stack in reverse order. That means that the last parameter is pushed first and the first parametersis pushed last. This means that in the procedure, the first parameter will be the parameter closest to the top of the stack.
The compiler is free to layout the local variables as it wants. It need not even use the stack.
It can store the local variables in an order unrelated to the order of declaration on the stack if it uses the stack.
Is the compiler allowed to do this optimization (by the C standard etc.)?
- If yes, why doesn't that happen above?
 
Well, is it an optimisation at all?
That's not clear. It uses a couple of bytes less, but that rarely matters. But on some architectures, it may be faster to read a char if it is stored word-aligned. So then putting the chars next to each other would force one of them at least to not be word-aligned and make reading it slower.
Is the compiler allowed to do this optimization (by the C standard etc.)?
Yes.
If yes, why doesn't that happen above?
It did happen.
Read the assembler output carefully.
    movb    $45, 26(%esp)    // local variable 'a'
    movl    $1337, 28(%esp)  // local variable 'i'
    movb    $43, 27(%esp)    // local variable 'b'
Variable a is at offset 26.
Variable b is at offset 27.
Variable i is at offset 28.
Using the images you made the layout is now:
+---+---+---+---+---+---+---+---+
|   |   | a | b | i             |
+---+---+---+---+---+---+---+---+
 ^^^^^^^
 2 bytes unused
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With