Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use .data instead of reserving space in .bss and initializing at runtime, for variables in assembly/C?

First of all: I know that there are a lot of web pages (including discussion on stackoverflow) where the differences between .bss and .data for the data declaration is discussed, but I have a specific question and I did not find the answer on these pages unfortunately, so I ask it here :-).

I am a big beginner in assembly, so I apologize if the question is stupid :-).

I am learning assembly on a x86 64-bit linux os (but I think that my question is more general and probably not specific to the os/the arcthitecture).

I find the definition of the .bss and .data sections a bit strange. I can always declare a variable in .bss and then move a value in this variable in my code (.text section), right ? So why should I declare a variable in the .data section, If I know that variables declared in this section will increase the size of my executable file ?

I could ask this question in the context of C programming as well: why should I initialize my variable when I declare it is more efficient to declare it uninitialized and then assign a value to it in the beginning of my code ?

I suppose that my approach of memory management is naive and not correct, but I do not understand why.

like image 340
Louis Avatar asked Mar 04 '23 11:03

Louis


2 Answers

.bss is where you put zero-initialized static data, like C int x; (at global scope). That's the same as int x = 0; for static / global (static storage class)1.

.data is where you put non-zero-initialized static data, like int x = 2; If you put that in BSS, you'd need a runtime static "constructor" to initalize the BSS location. Like what a C++ compiler would do for static const int prog_starttime = __rdtsc();. (Even though it's const, the initializer isn't a compile-time constant so it can't go in .rodata)


.bss with a runtime initializer would make sense for big arrays that are mostly zero or filled with the same value (memset / rep stosd), but in practice writing char buf[1024000] = {1}; will put 1MB of almost all zeros into .data, with current compilers.

Otherwise it is not more efficient. A mov dword [myvar], imm32 instruction is at least 8 bytes long, costing about twice as many bytes in your executable as if it were statically initialized in .data. Also, the initializer has to be executed.


By contrast, section .rodata (or .rdata on Windows) is where compilers put string literals, FP constants, and static const int x = 123; (Actually, x would normally get inlined as an immediate everywhere it's used in the compilation unit, letting the compiler optimize away any static storage. But if you took its address and passed &x to a function, the compiler would need it to exist in memory somewhere, and that would be in .rodata)


Footnote 1: Inside a function, int x; would be on the stack if the compiler didn't optimize it away or into registers, when compiling for a normal register machine with a stack like x86.


I could ask this question in the context of C programming as well

In C, an optimizing compiler will treat int x; x=5; pretty much identically to int x=5; inside a function. No static storage is involved. Looking at actual compiler output is often instructive: see How to remove "noise" from GCC/clang assembly output?.

Outside a function, at global scope, you can't write things like x=5;. You could do that at the top of main, and then you would trick the compiler into making worse code.

Inside a function with static int x = 5;, the initialization happens once. (At compile time). If you did static int x; x=5; the static storage would be re-initialized every time the function was entered, and you might as well have not used static unless you have other reasons for needing static storage class. (e.g. returning a pointer to x that's still valid after the function returns.)

like image 160
Peter Cordes Avatar answered Apr 27 '23 01:04

Peter Cordes


The size of an instruction that writes an immediate operand (i.e., a compile-time constant) into a memory location is necessarily larger than the size of the constant itself. If all of the constants are different values, then you need to use different instructions for different values and the total size of these instructions would be larger than the total size of the values. In addition, there will be a run-time performance overhead to execute these instructions. If the constants are the same, then a loop can be used to initialize all the corresponding variables. The loop itself would be indeed much smaller than the total size of the constants. In this case, instead of allocating many static variables to hold the same constant, you can use something like malloc followed by a loop to initialize the allocated region. This can significantly reduce the size of an object file and improve performance.

Consider an OS that keeps a number of pages initialized to some constant or different pages might be initialized to different constants. These pages can be prepared by the OS in a background thread. When a program requests a page that is initialized to a particular constant, the OS can simply maps one of the pages that it has already initialized to its page table, thereby avoiding the need to execute a loop at run-time. In fact, the Windows OS always initializes all reclaimed pages to a constant value of all-bits-zero. This is both a security feature and performance enhancement feature.

Static variables are typically either not initialized at compile-time or initialized to zero. Some languages, such as C and C++, require the runtime to initialize uninitialized static variables to zero. What is the most efficient way to initialize pages to zero? The C runtime could for example emit a sequence of instructions or a loop in the entry point of an object file to initialize all uninitialized static variables to the specified compile-time constants. But then every object file would require these instructions. It is more efficient space-wise to delegate the OS to do this initialization on-demand (on Linux) or proactively (on Windows).

The ELF executable format defines the bss section as the portion of the object file that contains uninitialized variables. Therefore, the bss section needs to only specify the total size of all variables, in contrast to the data section which needs to also specify the values of each variable. There is no requirement that the OS should initialize (or not) the bss section to zero or any other value, but typically this is indeed the case. In addition, although C/C++ requires the runtime to initialize all static variables that are not explicitly initialized to zero/null, the language standard does not define a particular bit pattern for zero/null. Only when the language implementation and the bss implementation match can uninitialized static variable be allocated in the bss section.

When Linux loads an ELF binary, it maps the bss section to a dedicated zero page marked as copy-on-write (see: How exactly does copy on write work). So there is no overhead to initialize that page to zero. In some cases, bss may occupy a fraction of a page (See for example Gnu assembler .data section value corrupted after syscall). In this case, that fraction is explicitly initialized to all-bits-zero using a movb/incq/decl/jnz loop.

A hypothetical OS can for example initialize each byte of the bss section to 0000_0001b. Also in a hypothetical implementation of C, the NULL-pointer bit pattern may be (multiple bytes of) 0000_0010b. In this case, default-initialized static pointer variables and arrays can be allocated in the bss section without any init loop inside the C program. But any other values, such as integer arrays, will need an init loop unless they happen to be explicitly initialized in the C source to a value that matches the bit-pattern.

(C allows an implementation-defined non-zero object representation for NULL pointers, but integers are more constrained. C rules require static storage-class variables to be implicitly initialized to 0 if not explicitly initialized. And unsigned char is required to be base 2 with no padding. 0 as an initializer for a pointer in the source maps to the NULL bit pattern, unlike using memcpy of unsigned char zeros into the object representation.)

like image 24
Hadi Brais Avatar answered Apr 27 '23 01:04

Hadi Brais