Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why global variables are stored in .data rather than in the heap of a process image?

I was studying for a test of OS Fundamentals and this came to my mind. When I declare a global (or static) variable in a C program like:

char* msg = "Hello World!\n";    

an array of bytes is reserved in the .data and the "Hello World!\n" string is saved in .text, then when the program is loaded to memory and starts to execute, the msg var is initialized with the string saved in .text. Is this what happens? So, what is the diference between reserving the bytes in .data rather than in heap? I know that in .data they have a static size but they cold be reserved in the heap too, right? Why are those things separated? Wouldn't it be more efficient to have just the heap, the stack and the code part in the process image rather than more fractions? It can't be because of physical memory being mapped to multiple virtual adresses (multiple instances of notepad for example) because these vars are editable.

Thank you in advance

like image 861
André Rosa Avatar asked Oct 29 '22 17:10

André Rosa


2 Answers

What this compiler does, is make this (constant) literal a read/write variable.

The compiler collects in .text all literal strings. When a literal string is used more than once in the program, it will use only one occurrence of the literal in .text.

At startup, it copies it to the reserved space in .data. This is funny:

char msg[] = "Hello World!\n";
char *msg  = "Hello World!\n";

That the compiler copies the first literal from .text to .data is OK; it is initializing the variable as per the user's instructions.

That the compiler copies the second literal to .data is not correct: it should have initialized *msg wih a pointer to the literal in .text and the .text segment should be made read-only (managed by the memory hardware, causing an exception when the memory is attempted to be written to).

like image 188
Paul Ogilvie Avatar answered Nov 15 '22 01:11

Paul Ogilvie


"Global" generally means "accessible from everywhere".

Generally one does that in assembler by placing the global data in a fixed location; then any code needing access simply references it directly by using its address. That is accomplished by placing global variables in the the .data segment; the linker will assign them fixed addresses.

You can consider placing "global data" in the heap. If you do that, how does code access it? It can't, without knowing where the data is in the heap. The only way for such code to know this is either be passed a pointer to the "global data" as an argument (that means every subroutine has to accept this pointer and pass it to all callees; that's really inconvenient), or code has to know where there is pointer into the heap that the code can access directly (that pointer would have to have a fixed address to be found, so the pointer itself is global data). Having such a pointer means that access to global data now always requires an indirection, which slows the code down. So, if you do this, you end up with an awkward and slow scheme for accessing "global" data. (Most people wouldn't call data allocated in the heap "global data").

So... global data is placed where it is easy to access. In the data segment.

If you have global data which is constant and will not change, you can put it in the "text" (code) segment. Putting such data in a text segment ensures, with most modern OSes, that such data is write protected, enforcing the "won't change" assumption. That helps find bugs in programs.

like image 27
Ira Baxter Avatar answered Nov 15 '22 01:11

Ira Baxter