Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do .bss section zero initialized variables occupy space in elf file?

If I understand correctly, the .bss section in ELF files is used to allocate space for zero-initialized variables. Our tool chain produces ELF files, hence my question: does the .bss section actually have to contain all those zeroes? It seems such an awful waste of spaces that when, say, I allocate a global ten megabyte array, it results in ten megabytes of zeroes in the ELF file. What am I seeing wrong here?

like image 976
Wouter Lievens Avatar asked Mar 04 '09 14:03

Wouter Lievens


People also ask

What is BSS section in elf?

bss section is where your program has all the uninitialized variables (by default all initialized to zero) The linker only needs to know the actual size of this region and the actual variable positions, but not the values, because its contents are obvious, independently of the nature or the distribution of the ...

Is BSS zero initialized?

On some platforms, some or all of the bss section is initialized to zeroes. Unix-like systems and Windows initialize the bss section to zero, allowing C and C++ statically allocated variables initialized to values represented with all bits zero to be put in the bss segment.

What is .data and .bss section?

What is the difference between the Data and BSS sections? BSS refers to uninitialized global and static objects and Data refers to initialized global and static objects. Both BSS and Data usually refer to RAM objects.


1 Answers

Has been some time since i worked with ELF. But i think i still remember this stuff. No, it does not physically contain those zeros. If you look into an ELF file program header, then you will see each header has two numbers: One is the size in the file. And another is the size as the section has when allocated in virtual memory (readelf -l ./a.out):

Program Headers:   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align   PHDR           0x000034 0x08048034 0x08048034 0x000e0 0x000e0 R E 0x4   INTERP         0x000114 0x08048114 0x08048114 0x00013 0x00013 R   0x1       [Requesting program interpreter: /lib/ld-linux.so.2]   LOAD           0x000000 0x08048000 0x08048000 0x00454 0x00454 R E 0x1000   LOAD           0x000454 0x08049454 0x08049454 0x00104 0x61bac RW  0x1000   DYNAMIC        0x000468 0x08049468 0x08049468 0x000d0 0x000d0 RW  0x4   NOTE           0x000128 0x08048128 0x08048128 0x00020 0x00020 R   0x4   GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4 

Headers of type LOAD are the one that are copied into virtual memory when the file is loaded for execution. Other headers contain other information, like the shared libraries that are needed. As you see, the FileSize and MemSiz significantly differ for the header that contains the bss section (the second LOAD one):

0x00104 (file-size) 0x61bac (mem-size) 

For this example code:

int a[100000]; int main() { } 

The ELF specification says that the part of a segment that the mem-size is greater than the file-size is just filled out with zeros in virtual memory. The segment to section mapping of the second LOAD header is like this:

03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss 

So there are some other sections in there too. For C++ constructor/destructors. The same thing for Java. Then it contains a copy of the .dynamic section and other stuff useful for dynamic linking (i believe this is the place that contains the needed shared libraries among other stuff). After that the .data section that contains initialized globals and local static variables. At the end, the .bss section appears, which is filled by zeros at load time because file-size does not cover it.

By the way, you can see into which output-section a particular symbol is going to be placed by using the -M linker option. For gcc, you use -Wl,-M to put the option through to the linker. The above example shows that a is allocated within .bss. It may help you verify that your uninitialized objects really end up in .bss and not somewhere else:

.bss            0x08049560    0x61aa0  [many input .o files...]  *(COMMON)   *fill*         0x08049568       0x18 00  COMMON         0x08049580    0x61a80 /tmp/cc2GT6nS.o                 0x08049580                a                 0x080ab000                . = ALIGN ((. != 0x0)?0x4:0x1)                  0x080ab000                . = ALIGN (0x4)                  0x080ab000                . = ALIGN (0x4)                  0x080ab000                _end = . 

GCC keeps uninitialized globals in a COMMON section by default, for compatibility with old compilers, that allow to have globals defined twice in a program without multiple definition errors. Use -fno-common to make GCC use the .bss sections for object files (does not make a difference for the final linked executable, because as you see it's going to get into a .bss output section anyway. This is controlled by the linker script. Display it with ld -verbose). But that shouldn't scare you, it's just an internal detail. See the manpage of gcc.

like image 79
Johannes Schaub - litb Avatar answered Oct 03 '22 00:10

Johannes Schaub - litb