Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how do compilers assign memory addresses to variables?

I teach a course where students get to ask questions about programming (!): I got this question:

Why does the machine choose were variables go in memory? Can we tell it where to store a variable?

I don't really know what to say. Here's my first attempt:

The compiler (not the machine) chooses where to store the variables in the process address space automatically. Using C, we cannot tell the machine where to store variables.

But that "automatically" is somewhat anticlimactic and begs the question... and I've realized I don't even know if it's the compiler or the runtime or the OS or who does the assignment. Maybe someone can answer the student's question better than me.

like image 359
Dervin Thunk Avatar asked Aug 26 '13 14:08

Dervin Thunk


People also ask

How does the compiler allocate memory?

When a variable is declared compiler automatically allocates memory for it. This is known as compile time memory allocation or static memory allocation. Memory can be allocated for data variables after the program begins execution. This mechanism is known as runtime memory allocation or dynamic memory allocation.

How is the variable address determined?

Most variables stored in the array (i.e., in main memory) are larger than one byte, so the address of each variable is the index of the first byte of that variable. Viewing main memory as an array of bytes. Main memory, often called RAM, can be visualized as a contiguous array of bytes.

How is memory address generated?

Before processing, input device/keyboard data, stored software or secondary storage must be copied to RAM with assigned memory addresses. Memory addresses are usually allocated during the boot process. This initiates the startup BIOS on the ROM BIOS chip, which becomes the assigned address.

How does a variable get stored in memory?

Variables are usually stored in RAM. This is either on the heap (e.g. all global variables will usually go there) or on the stack (all variables declared within a method/function usually go there). Stack and Heap are both RAM, just different locations.


2 Answers

The answer to this question is quite complex since there are various approaches to memory allocation depending on variable scope, size and programming environment.

Stack allocated variables

Typically local variables are put on the "stack". This means that the compiler assigns an offset to the "stack pointer" which can be different depending on the invocation of the current function. I.e. the compiler assumes that memory locations like Stack-Pointer+4, Stack-Pointer+8, etc. are accessible and usable by the program. Upon return-ing from the function the memory locations are not guaranteed to retain these values.

This is mapped into assembly instructions similar to the following. esp is the stack pointer, esp + N refers to a memory location relative to esp:

mov eax, DWORD PTR SS:[esp] mov eax, DWORD PTR SS:[esp + 4] mov eax, DWORD PTR SS:[esp + 8] 

Heap

Then there are variables that are heap-allocated. This means that there is a library call to request memory from the standard library (alloc in C or new in C++). This memory is reserved until the end of the programs execution. alloc and new return pointers to memory in a region of memory called the heap. The allocating functions have to make sure that the memory is not reserved which can make heap-allocation slow at times. Also, if you don't want to run out of memory you should free (or delete) memory that is not used anymore. Heap allocation is quite complicated internally since the standard library has to keep track of used and unsused ranges in memory as well as freed ranges of memory. Therefore even freeing a heap-allocated variable can be more time-consuming than allocating it. For more information see How is malloc() implemented internally?

Understanding the difference between stack and heap is quite fundamental to learning how to program in C and C++.

Arbitrary Pointers

Naively one might assume, that by setting a pointer to an arbitrary address int *a = 0x123 it should be possible to address arbitrary locations in the computer's memory. This does not exactly hold true since (depending on the CPU und system) programs are heavily restricted when addressing memory.

Getting a feel for memory

In a guided classroom experience, it might be beneficial to explore some simple C code by compiling source code to assembler (gcc can do this for example). A simple function such as int foo(int a, int b) { return a+b;} should suffice (without optimizations). Then see something like int bar(int *a, int *b) { return (*a) + (*b);};

When invoking bar, allocate the parameters once on the stack, once per malloc.

Conclusion

The compiler does perform some variable placment and alignment relative to base-adresses which are obtained by the program/standard library at runtime.

For a deep understanding of memory related questions see Ulrich Drepper's "What every programmer should know about memory" http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.91.957

Apart from C-ish Country idenote

Then there also is Garbage Collection which is popular among lots of scripting languages (Python, Perl, Javascript, lisp) and device independent environments (Java, C#). It is related to heap allocation but slightly more complicated.

Varieties of programming languages are only heap-based (stackless python) or entirely stack based (forth).

like image 127
wirrbel Avatar answered Sep 27 '22 23:09

wirrbel


I think the answer to this question starts with an understanding of the layout of a program in memory. Underneath the operating system, a computer's main memory is just a giant array. When you run a program, the operating system will take a chunk of this memory and break it up into logical sections for the following purposes:

  • stack: this area of memory stores information about all functions currently in scope, including the currently running function and all of its ancestors. Information stored includes local variables and the address to return to when the function is done.

  • heap: this area of memory is used when you want to dynamically allocate some storage. Generally your local variable would then contain an address (ie, it would be a pointer) in the heap where your data is stored, and you could publish this address to other parts of your program without worrying that your data will be overwritten when the current function goes out of scope.

  • data, bss, text segments: these are more or less outside the scope of this particular question, but they store things such as global data and the program itself.

Hope that helps. There are lots of good resources online as well. I just googled "layout of a program in memory" and found this one: http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory

like image 40
danben Avatar answered Sep 27 '22 22:09

danben