Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inline and stack frame control

Tags:

c

gcc

embedded

The following are artificial examples. Clearly compiler optimizations will dramatically change the final outcome. However, and I cannot stress this more: by temporarily disabling optimizations, I intend to have an upper bound on stack usage, likely, I expect that further compiler optimization can improve the situation.

The discussion in centered around GCC only. I would like to have fine control over how automatic variables get released from the stack. Scoping with blocks does not ensure that memory will be released when automatic variables go out of scope. Functions, as far as I know, do ensure that.

However, when inlining, what is the case? For example:

inline __attribute__((always_inline)) void foo()
{
    uint8_t buffer1[100];
    // Stack Size Measurement A
    // Do something 
}

void bar()
{
    foo();
    uint8_t buffer2[100];
    // Stack Size Measurement B
    // Do something else
}

Can I always expect that at measurement point B, the stack will only containbuffer2 and buffer1 has been released?

Apart from function calls (which result in additional stack usage) is there any way I can have fine control over stack deallocations?

like image 319
Juan Leni Avatar asked May 22 '18 07:05

Juan Leni


2 Answers

I would like to have fine control over how automatic variables get released from the stack.

Lots of confusion here. The optimizing compiler could store some automatic variables only in registers, without using any slot in the call frame. The C language specification (n1570) does not require any call stack.

And a given register, or slot in the call frame, can be reused for different purposes (e.g. different automatic variables in different parts of the function). Register allocation is a significant role of compilers.

Can I always expect that at measurement point B, the stack will only containbuffer2 and buffer1 has been released?

Certainly not. The compiler could prove that at some later point in your code, the space for buffer1 is not useful anymore so reuse that space for other purposes.

is there any way I can have fine control over stack deallocations?

No, there is not. The call stack is an implementation detail, and might not be used (or be "abused" in your point of view) by the compiler and the generated code.

For some silly example, if buffer1 is not used in foo, the compiler might not allocate space for it. And some clever compilers might just allocate 8 bytes in it, if they can prove that only 8 first bytes of buffer1 are useful.

More seriously, in some cases, GCC is able to do tail-call optimizations.

You should be interested in invoking GCC with -fstack-reuse=all, -Os, -Wstack-usage=256, -fstack-usage, and other options.

Of course, the concrete stack usage depends upon the optimization levels. You might also inspect the generated assembler code, e.g. with -S -O2 -fverbose-asm

For example, the following code e.c:

int f(int x, int y) {
    int t[100];
    t[0] = x;
    t[1] = y;
    return t[0]+t[1];
}

when compiled with GCC8.1 on Linux/Debian/x86-64 using gcc -S -fverbose-asm -O2 e.c gives in e.s

        .text
        .p2align 4,,15
        .globl  f
        .type   f, @function
f:
.LFB0:
        .cfi_startproc
# e.c:5:      return t[0]+t[1];
        leal    (%rdi,%rsi), %eax       #, tmp90
# e.c:6: }
        ret     
        .cfi_endproc
.LFE0:
        .size   f, .-f

and you see that the stack frame is not grown by 100*4 bytes. And this is still the case with:

int f(int x, int y, int n) {
    int t[n];
    t[0] = x;
    t[1] = y;
    return t[0]+t[1];
}

which actually generates the same machine code as above. And if instead of the + above I'm calling some inline int add(int u, int v) { return u+v; } the generated code is not changing.

Be aware of the as-if rule, and of the tricky notion of undefined behavior (if n was 1 above, it is UB).

like image 114
Basile Starynkevitch Avatar answered Oct 15 '22 14:10

Basile Starynkevitch


Can I always expect that at measurement B, the stack will only containbuffer2 and buffer1 has been released?

No. It's going to depend on GCC version, target, optimization level, options.

Apart from function calls (which result in additional stack usage) is there any way I can have fine control over stack deallocations?

Your requirement is so specific I guess you will likely have to write yourself the code in assembler.

like image 22
Yann Droneaud Avatar answered Oct 15 '22 14:10

Yann Droneaud