Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using symbol '_end' in g++ leads to a segmentation fault

Tags:

gcc

linker

clang

Consider the following c++ source code:

int _end[1050];

int main() {
    for (int i = 0; i < 1050; i++)
        _end[i] = 0;
    return 0;
}

Compilation line: g++ main.cpp -o main -O0

Running this code leads to segmentation fault when using gcc-4.8.4 and clang-3.6.0 under Ubuntu 14.04. The strange behaviour is that the symbol _end points at the end of a statically allocated array _end, not at its beginning. If we replace _end with end_, everything works fine.

Moreover, if we ask gcc to output an assembly code by providing -S command-line argument, there will be no significant difference between the version with "_end" and the version with any other array name:

$ g++ main.cpp -o main.s -O0 -S
$ g++ main2.cpp -o main2.s -O0 -S
$ diff main.s main2.s
1,2c1,2
<   .file   "main.cpp"
<   .globl  _end
---
>   .file   "main2.cpp"
>   .globl  end_
5,7c5,7
<   .type   _end, @object
<   .size   _end, 4200
< _end:
---
>   .type   end_, @object
>   .size   end_, 4200
> end_:
25c25
<   movl    $0, _end(,%rax,4)
---
>   movl    $0, end_(,%rax,4)

But if we use objdump to dump the executables and run diff against them, we will see that in the _end version the used address is 4200 = 4 * 1050 bytes further than needed:

$ g++ main.cpp -o main -O0
$ g++ main2.cpp -o main2 -O0
$ objdump -d main >main.dump
$ objdump -d main2 > main2.dump
$ diff main.dump main2.dump
2c2
< main:     формат файла elf64-x86-64    // "File format" in Russian
---
> main2:     формат файла elf64-x86-64
123c123
<   4004ff: c7 04 85 c8 20 60 00    movl   $0x0,0x6020c8(,%rax,4)
---
>   4004ff: c7 04 85 60 10 60 00    movl   $0x0,0x601060(,%rax,4)

As far as I know, gcc compiler may treat variables starting with underscores as it wants, i. e. this is a bad practise to use such symbols in your code. But my question is: what really happens here? Why _end is replaced with an address of the end of an allocated array? Why there is no difference if we use "-S" command-line argument, but there is actually a difference in created binaries? Not that gcc and clang behave identically in this case, that is also strange to me.

like image 775
Maxim Akhmedov Avatar asked Nov 17 '15 15:11

Maxim Akhmedov


People also ask

What can cause a segmentation fault?

A segfault occurs when a reference to a variable falls outside the segment where that variable resides, or when a write is attempted to a location that is in a read-only segment.

How do you fix a segmentation fault?

It can be resolved by having a base condition to return from the recursive function. A pointer must point to valid memory before accessing it.


1 Answers

Tokens that begin with _ are reserved, and you shouldn't use them. It seems that _end is an external symbol defined for programs compiled on Linux, and represents the first address past the end of the uninitialized data segment (also known as the BSS segment).

Note: On some systems the names of these symbols are preceded by underscores, thus: _etext, _edata, and _end.

Source: http://man7.org/linux/man-pages/man3/end.3.html

like image 76
vsoftco Avatar answered Sep 18 '22 18:09

vsoftco