Consider the following c++ source code:
int _end[1050];
int main() {
for (int i = 0; i < 1050; i++)
_end[i] = 0;
return 0;
}
Compilation line: g++ main.cpp -o main -O0
Running this code leads to segmentation fault when using gcc-4.8.4 and clang-3.6.0 under Ubuntu 14.04. The strange behaviour is that the symbol _end
points at the end of a statically allocated array _end
, not at its beginning. If we replace _end
with end_
, everything works fine.
Moreover, if we ask gcc to output an assembly code by providing -S command-line argument, there will be no significant difference between the version with "_end" and the version with any other array name:
$ g++ main.cpp -o main.s -O0 -S
$ g++ main2.cpp -o main2.s -O0 -S
$ diff main.s main2.s
1,2c1,2
< .file "main.cpp"
< .globl _end
---
> .file "main2.cpp"
> .globl end_
5,7c5,7
< .type _end, @object
< .size _end, 4200
< _end:
---
> .type end_, @object
> .size end_, 4200
> end_:
25c25
< movl $0, _end(,%rax,4)
---
> movl $0, end_(,%rax,4)
But if we use objdump to dump the executables and run diff against them, we will see that in the _end
version the used address is 4200 = 4 * 1050 bytes further than needed:
$ g++ main.cpp -o main -O0
$ g++ main2.cpp -o main2 -O0
$ objdump -d main >main.dump
$ objdump -d main2 > main2.dump
$ diff main.dump main2.dump
2c2
< main: формат файла elf64-x86-64 // "File format" in Russian
---
> main2: формат файла elf64-x86-64
123c123
< 4004ff: c7 04 85 c8 20 60 00 movl $0x0,0x6020c8(,%rax,4)
---
> 4004ff: c7 04 85 60 10 60 00 movl $0x0,0x601060(,%rax,4)
As far as I know, gcc compiler may treat variables starting with underscores as it wants, i. e. this is a bad practise to use such symbols in your code. But my question is: what really happens here? Why _end
is replaced with an address of the end of an allocated array? Why there is no difference if we use "-S" command-line argument, but there is actually a difference in created binaries? Not that gcc and clang behave identically in this case, that is also strange to me.
A segfault occurs when a reference to a variable falls outside the segment where that variable resides, or when a write is attempted to a location that is in a read-only segment.
It can be resolved by having a base condition to return from the recursive function. A pointer must point to valid memory before accessing it.
Tokens that begin with _
are reserved, and you shouldn't use them. It seems that _end
is an external symbol defined for programs compiled on Linux, and represents the first address past the end of the uninitialized data segment (also known as the BSS segment).
Note: On some systems the names of these symbols are preceded by underscores, thus: _etext, _edata, and _end.
Source: http://man7.org/linux/man-pages/man3/end.3.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With