Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What decides memory address for global variables. Compiler or Operating system?

Tags:

c

assembly

x86-16

Consider the below program.

int a = 0x45;
int main()
{
   int i = a;
   return 0;
}

;; asm code
call   0x401780 <__main>
mov    0x402000,%eax   // why does it allocate 0x402000 only for global 'a'?
mov    %eax,0xc(%esp)
mov    $0x0,%eax
leave

This is the equivalent assembly code generated in CodeBlocks on Windows/xp. I understand 0x402000 is data segment address. But is that memory location hardcoded by compiler?

I think it is not hardcoded because that memory location may / may not be used by other applications too.

As we know, Operating system allocates Stack frame for local variables and returns base addess of stack frame. and local variables are accessed using %esp and %ebp registers with offset.

Does the operating system do the same for global variables? If it does the same why the value is hardcoded?

dw a 0x40; this directive allocates memory on data segment
mov %ax,a; copies value of a to accumulator

But how does compiler know 'a' has memory address 0x402000. If compiler has hardcoded the value as 0x402000 it should first make sure that address is not used by another application right?

If operating system allocates memory on datasegment, the memory address should be varied depending upon the applications and resources. Could anyone explain what really happens when I define global variables?

like image 822
user3205479 Avatar asked Oct 19 '14 12:10

user3205479


1 Answers

As Prof Falken mentioned this depends on the compiler/system...but...Linux, Windows, Mac, popular/primary toolchains:

The compiler takes the high level source and makes assembly out of it, the assembler turns that into an object. The object resolves what relative addresses it can, but leaves clues for the linker.

The linker...links...it takes the objects, their binary blobs, arranges them into the binary address space it is told about, it picks the addresses for things like globals and functions. Basically it places .text, .data, and .bss.

Then there is the mmu in hardware, this has made life much simpler, you can for example compile every program for say address 0x8000 as an entry point, and have many many programs all running at address 0x8000 at the same time. Because they all think they are at that address because in the virtual address space on the virtual side of the mmu they are. On the physical side, they are all actually living at different addresses, but generally only the operating system needs to care about that.

So the compilers these days typically place functions in the order that we wrote them in the source code in the object, the .data and .bss items they sometimes rearrange on us. The linkers generally operate as they are told, and who tells them? Ultimately us, the programmers, but the toolchain provided to you has defaults (like automatically assembling the compiled code into an object and automatically linking) including the bootstrap code and a default linker script. That default linker script for that compiler for that target operating system is setup per the rules of that operating system.

The above is what you typically see with gcc and other primary compilers for the leading operating systems the windows, mac, and *nix. That doesnt mean there arent toolchains out there now that do something different compile straight to final binary, or assemblers that go straight to final binary and not object. Certainly historically it wasnt always this way either. Until you get into those corner cases I assume you are going to have the above experience as you dig into the tools.

like image 193
old_timer Avatar answered Sep 21 '22 18:09

old_timer