Using symbol '_end' in g++ leads to a segmentation fault

Tags:

Consider the following c++ source code:

int _end[1050];

int main() {
    for (int i = 0; i < 1050; i++)
        _end[i] = 0;
    return 0;
}

Compilation line: g++ main.cpp -o main -O0

Running this code leads to segmentation fault when using gcc-4.8.4 and clang-3.6.0 under Ubuntu 14.04. The strange behaviour is that the symbol _end points at the end of a statically allocated array _end, not at its beginning. If we replace _end with end_, everything works fine.

Moreover, if we ask gcc to output an assembly code by providing -S command-line argument, there will be no significant difference between the version with "_end" and the version with any other array name:

$ g++ main.cpp -o main.s -O0 -S
$ g++ main2.cpp -o main2.s -O0 -S
$ diff main.s main2.s
1,2c1,2
<   .file   "main.cpp"
<   .globl  _end
---
>   .file   "main2.cpp"
>   .globl  end_
5,7c5,7
<   .type   _end, @object
<   .size   _end, 4200
< _end:
---
>   .type   end_, @object
>   .size   end_, 4200
> end_:
25c25
<   movl    $0, _end(,%rax,4)
---
>   movl    $0, end_(,%rax,4)

But if we use objdump to dump the executables and run diff against them, we will see that in the _end version the used address is 4200 = 4 * 1050 bytes further than needed:

$ g++ main.cpp -o main -O0
$ g++ main2.cpp -o main2 -O0
$ objdump -d main >main.dump
$ objdump -d main2 > main2.dump
$ diff main.dump main2.dump
2c2
< main:     формат файла elf64-x86-64    // "File format" in Russian
---
> main2:     формат файла elf64-x86-64
123c123
<   4004ff: c7 04 85 c8 20 60 00    movl   $0x0,0x6020c8(,%rax,4)
---
>   4004ff: c7 04 85 60 10 60 00    movl   $0x0,0x601060(,%rax,4)

As far as I know, gcc compiler may treat variables starting with underscores as it wants, i. e. this is a bad practise to use such symbols in your code. But my question is: what really happens here? Why _end is replaced with an address of the end of an allocated array? Why there is no difference if we use "-S" command-line argument, but there is actually a difference in created binaries? Not that gcc and clang behave identically in this case, that is also strange to me.

775

asked Nov 17 '15 15:11

Maxim Akhmedov

1 Answers

Tokens that begin with _ are reserved, and you shouldn't use them. It seems that _end is an external symbol defined for programs compiled on Linux, and represents the first address past the end of the uninitialized data segment (also known as the BSS segment).

Note: On some systems the names of these symbols are preceded by underscores, thus: _etext, _edata, and _end.

Source: http://man7.org/linux/man-pages/man3/end.3.html

answered Sep 18 '22 18:09

vsoftco

Related questions
                            
                                How to make lcov perform faster?
                            
                                How can I mitigate the impact of the Intel jcc erratum on gcc?
                            
                                gcc - writing and executing code in the bss - setting the permission flags
                            
                                How to identify whether two different versions of gcc are compatible?
                            
                                Get GCC To Use Carry Logic For Arbitrary Precision Arithmetic Without Inline Assembly?
                            
                                gcc disable ALL warnings for a few lines of code
                            
                                x86_64: forcing gcc to pass arguments on the stack
                            
                                Valgrind reporting Mismatched free() / delete / delete []
                            
                                How -fvisibility-inlines-hidden differs from -fvisibility=hidden in gcc
                            
                                C error: "initializer element is not constant" with &, works with +
                            
                                Automatically unrolling and outputting for C/C++ code
                            
                                Forcing or preventing use of a particular minor version of libstdc++
                            
                                Do gcc's __float128 floating point numbers take the current rounding mode into account?
                            
                                Whats the design rationale between GCC exporting all symbols by default vs MSVC not exporting anything by default?
                            
                                C++ Low latency Design: Function Dispatch v/s CRTP for Factory implementation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using symbol '_end' in g++ leads to a segmentation fault

Tags:

gcc

linker

clang

Maxim Akhmedov

People also ask

1 Answers

vsoftco

Recent Activity

Donate For Us