Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c & gcc : Stack growth and alignment - for a 64 bit machine

Tags:

c

linux

gcc

I have the following program. I wonder why it outputs -4 on the following 64 bit machine? Which of my assumptions went wrong ?

[Linux ubuntu 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux]

  1. In the above machine and gcc compiler, by default b should be pushed first and a second. The stack grows downwards. So b should have higher address and a have lower address. So result should be positive. But I got -4. Can anybody explain this ?

  2. The arguments are two chars occupying 2 bytes in the stack frame. But I saw the difference as 4 where as I am expecting 1. Even if somebody says it is because of alignment, then I am wondering a structure with 2 chars is not aligned at 4 bytes.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void CompareAddress(char a, char b)
{
    printf("Differs=%ld\n", (intptr_t )&b - (intptr_t )&a);
}

int main()
{
    CompareAddress('a','b');
    return 0; 
}

/* Differs= -4 */
like image 750
Lunar Mushrooms Avatar asked Jun 16 '12 02:06

Lunar Mushrooms


People also ask

What is C in simple words?

What is C? C is a general-purpose programming language created by Dennis Ritchie at the Bell Laboratories in 1972. It is a very popular language, despite being old. C is strongly associated with UNIX, as it was developed to write the UNIX operating system.

Is C language easy?

Compared to other languages—like Java, PHP, or C#—C is a relatively simple language to learn for anyone just starting to learn computer programming because of its limited number of keywords.

What is C language used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

Is C or C++ same?

The main difference between C and C++ is that C is a procedural programming language that does not support classes and objects. On the other hand, C++ is an extension of C programming with object-oriented programming (OOP) support.


2 Answers

Here's my guess:

On Linux in x64, the calling convention states that the first few parameters are passed by register.

So in your case, both a and b are passed by register rather than on the stack. However, since you take its address, the compiler will store it somewhere on the stack after the function is called.
(Not necessary in the downwards order.)

It's also possible that the function is just outright inlined.

In either case, the compiler makes temporary stack space to store the variables. Those can be in any order and subject to optimizations. So they may not be in any particular order that you might expect.

like image 106
Mysticial Avatar answered Oct 30 '22 01:10

Mysticial


The best way to answer these sort of questions (about behaviour of a specific compiler on a specific platform) is to look at the assembler. You can get gcc to dump its assembler by passing the -S flag (and the -fverbose-asm flag is nice too). Running

gcc -S -fverbose-asm file.c

gives a file.s that looks a little like (I've removed all the irrelevant bits, and the bits in parenthesis are my notes):

CompareAddress:
        # ("allocate" memory on the stack for local variables)
        subq    $16, %rsp       
        # (put a and b onto the stack)
        movl    %edi, %edx      # a, tmp62
        movl    %esi, %eax      # b, tmp63
        movb    %dl, -4(%rbp)   # tmp62, a
        movb    %al, -8(%rbp)   # tmp63, b 
        # (get their addresses)
        leaq    -8(%rbp), %rdx  #, b.0
        leaq    -4(%rbp), %rax  #, a.1
        subq    %rax, %rdx      # a.1, D.4597 (&b - &a)
        # (set up the parameters for the printf call)
        movl    $.LC0, %eax     #, D.4598
        movq    %rdx, %rsi      # D.4597,
        movq    %rax, %rdi      # D.4598,
        movl    $0, %eax        #,
        call    printf  #

main:
        # (put 'a' and 'b' into the registers for the function call)
        movl    $98, %esi       #,
        movl    $97, %edi       #,
        call    CompareAddress

(This question explains nicely what [re]bp and [re]sp are.)

The reason the difference is negative is the stack grows downward: i.e. if you push two things onto the stack, the one you push first will have a larger address, and a is pushed before b.

The reason it is -4 rather than -1 is the compiler has decided that aligning the arguments to 4 byte boundaries is "better", probably because a 32 bit/64 bit CPU deals with 4 bytes at time better than it handles single bytes.

(Also, looking at the assembler shows the effect that -mpreferred-stack-boundary has: it essentially means that memory on the stack is allocated in different sized chunks.)

like image 20
huon Avatar answered Oct 30 '22 01:10

huon