I've experienced a problem with a crash due to return an rvalue of a bitset when the bitset is large. Is this a compiler bug or have I mistakenly done something that caused undefined behaviour?
The code below crashes on GCC 4.6.3 with the -std=c++0x
flag set.
#include <bitset>
// typedef std::bitset<0xffff> uut;
typedef std::bitset<0xffffff> uut;
struct foo {
foo(uut b)
: b_(std::move(b))
{
}
uut b_;
};
uut make_bits(int)
{
uut bits;
// Only works for 0xffff:
return std::move(bits);
// Works for both 0xffff and 0xffffff:
//return bits;
}
int main()
{
foo(make_bits(0));
}
Weirdly if I remove the int
parameter it's ok, maybe that causes the function to be inlined?
As @unwind suggested, here's the output run under valgrind ./a.out
:
==24780== Memcheck, a memory error detector
==24780== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==24780== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==24780== Command: ./a.out
==24780==
==24780== Warning: client switching stacks? SP change: 0x7ff000068 --> 0x7fea00058
==24780== to suppress, use: --max-stackframe=6291472 or greater
==24780== Invalid write of size 8
==24780== at 0x4005E5: main (in /home/sam/scratch/a.out)
==24780== Address 0x7fea00058 is on thread 1's stack
==24780==
==24780== Warning: client switching stacks? SP change: 0x7fea00050 --> 0x7fe800040
==24780== to suppress, use: --max-stackframe=2097168 or greater
==24780== Invalid write of size 8
==24780== at 0x40056F: make_bits(int) (in /home/sam/scratch/a.out)
==24780== by 0x4005E9: main (in /home/sam/scratch/a.out)
==24780== Address 0x7fe800048 is on thread 1's stack
==24780==
==24780==
==24780== Process terminating with default action of signal 11 (SIGSEGV)
==24780== Access not within mapped region at address 0x7FE800048
==24780== at 0x40056F: make_bits(int) (in /home/sam/scratch/a.out)
==24780== If you believe this happened as a result of a stack
==24780== overflow in your program's main thread (unlikely but
==24780== possible), you can try to increase the size of the
==24780== main thread stack using the --main-stacksize= flag.
==24780== The main thread stack size used in this run was 8388608.
==24780==
==24780== Process terminating with default action of signal 11 (SIGSEGV)
==24780== Access not within mapped region at address 0x7FE800039
==24780== at 0x4A255A0: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-amd64-linux.so)
==24780== If you believe this happened as a result of a stack
==24780== overflow in your program's main thread (unlikely but
==24780== possible), you can try to increase the size of the
==24780== main thread stack using the --main-stacksize= flag.
==24780== The main thread stack size used in this run was 8388608.
==24780==
==24780== HEAP SUMMARY:
==24780== in use at exit: 0 bytes in 0 blocks
==24780== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==24780==
==24780== All heap blocks were freed -- no leaks are possible
==24780==
==24780== For counts of detected and suppressed errors, rerun with: -v
==24780== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 2 from 2)
And with valgrind --max-stacksize=99999999 ./a.out
, as valgrind prompted me to:
==24790== Memcheck, a memory error detector
==24790== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==24790== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==24790== Command: ./a.out
==24790==
==24790== Warning: client switching stacks? SP change: 0x7ff000068 --> 0x7fea00058
==24790== to suppress, use: --max-stackframe=6291472 or greater
==24790== Invalid write of size 8
==24790== at 0x4005E5: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fea00058 is on thread 1's stack
==24790==
==24790== Warning: client switching stacks? SP change: 0x7fea00050 --> 0x7fe800040
==24790== to suppress, use: --max-stackframe=2097168 or greater
==24790== Invalid write of size 8
==24790== at 0x40056F: make_bits(int) (in /home/sam/scratch/a.out)
==24790== by 0x4005E9: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fe800048 is on thread 1's stack
==24790==
==24790== Invalid write of size 4
==24790== at 0x400576: make_bits(int) (in /home/sam/scratch/a.out)
==24790== by 0x4005E9: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fe800044 is on thread 1's stack
==24790==
==24790== Invalid write of size 8
==24790== at 0x400590: make_bits(int) (in /home/sam/scratch/a.out)
==24790== by 0x4005E9: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fe800038 is on thread 1's stack
==24790==
==24790== Invalid write of size 4
==24790== at 0x4C2E0E0: memset (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24790== by 0x400594: make_bits(int) (in /home/sam/scratch/a.out)
==24790== by 0x4005E9: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fe800050 is on thread 1's stack
==24790==
==24790== Invalid write of size 4
==24790== at 0x4C2E0EB: memset (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24790== by 0x400594: make_bits(int) (in /home/sam/scratch/a.out)
==24790== by 0x4005E9: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fe800058 is on thread 1's stack
==24790==
==24790== Invalid read of size 8
==24790== at 0x4C2E10E: memset (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24790== by 0x4005E9: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fe800038 is on thread 1's stack
==24790==
==24790== Invalid read of size 8
==24790== at 0x4005A7: make_bits(int) (in /home/sam/scratch/a.out)
==24790== by 0x4005E9: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fe800048 is on thread 1's stack
==24790==
==24790== Invalid write of size 8
==24790== at 0x4C2D10D: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24790== by 0x4005C0: make_bits(int) (in /home/sam/scratch/a.out)
==24790== by 0x4005E9: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fee00058 is on thread 1's stack
==24790==
==24790== Invalid read of size 8
==24790== at 0x4C2D11A: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24790== by 0x4005C0: make_bits(int) (in /home/sam/scratch/a.out)
==24790== by 0x4005E9: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fe9fffc8 is on thread 1's stack
==24790==
==24790== Invalid read of size 8
==24790== at 0x4C2D108: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24790== by 0x4005C0: make_bits(int) (in /home/sam/scratch/a.out)
==24790== by 0x4005E9: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fe9fffc0 is on thread 1's stack
==24790==
==24790== Invalid read of size 8
==24790== at 0x4005C1: make_bits(int) (in /home/sam/scratch/a.out)
==24790== by 0x4005E9: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fe800048 is on thread 1's stack
==24790==
==24790== Warning: client switching stacks? SP change: 0x7fe800040 --> 0x7fea00050
==24790== to suppress, use: --max-stackframe=2097168 or greater
==24790== further instances of this message will not be shown.
==24790== Invalid read of size 8
==24790== at 0x4005C9: make_bits(int) (in /home/sam/scratch/a.out)
==24790== by 0x4E5376C: (below main) (libc-start.c:226)
==24790== Address 0x7fea00058 is on thread 1's stack
==24790==
==24790== Invalid read of size 8
==24790== at 0x4C2D000: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24790== by 0x40060A: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fec00060 is on thread 1's stack
==24790==
==24790== Invalid write of size 8
==24790== at 0x4C2D004: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24790== by 0x40060A: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fea00060 is on thread 1's stack
==24790==
==24790== Invalid read of size 8
==24790== at 0x4C2D00F: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24790== by 0x40060A: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fec00070 is on thread 1's stack
==24790==
==24790== Invalid read of size 8
==24790== at 0x4C2D108: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24790== by 0x400650: foo::foo(std::bitset<16777215ul>) (in /home/sam/scratch/a.out)
==24790== by 0x400612: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fec00058 is on thread 1's stack
==24790==
==24790== Invalid read of size 8
==24790== at 0x4C2D11A: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24790== by 0x400650: foo::foo(std::bitset<16777215ul>) (in /home/sam/scratch/a.out)
==24790== by 0x400612: main (in /home/sam/scratch/a.out)
==24790== Address 0x7fec00048 is on thread 1's stack
==24790==
==24790== Invalid write of size 8
==24790== at 0x4C2D10D: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24790== by 0x400650: foo::foo(std::bitset<16777215ul>) (in /home/sam/scratch/a.out)
==24790== by 0x400612: main (in /home/sam/scratch/a.out)
==24790== Address 0x7feffffe0 is on thread 1's stack
==24790==
==24790==
==24790== HEAP SUMMARY:
==24790== in use at exit: 0 bytes in 0 blocks
==24790== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==24790==
==24790== All heap blocks were freed -- no leaks are possible
==24790==
==24790== For counts of detected and suppressed errors, rerun with: -v
==24790== ERROR SUMMARY: 2097097 errors from 19 contexts (suppressed: 2 from 2)
We can see exactly what GCC is doing under the hood by compiling both cases with -S:
g++-4.6 -std=c++0x test.cc -S -fverbose-asm
And then using diff to compare the outputs:
diff -rNu move.s ret.s |c++filt
--- move.s 2015-05-21 14:00:49.097524035 +0100
+++ ret.s 2015-05-21 14:00:40.021510019 +0100
@@ -79,23 +79,13 @@
.cfi_offset 5, -8
movl %esp, %ebp #,
.cfi_def_cfa_register 5
- subl $2097176, %esp #,
- leal -2097160(%ebp), %eax #, tmp60
+ subl $24, %esp #,
+ movl 8(%ebp), %eax # .result_ptr, tmp59
movl $2097152, %edx #, tmp61
movl %edx, 8(%esp) # tmp61,
movl $0, 4(%esp) #,
movl %eax, (%esp) # tmp60,
call memset #
- leal -2097160(%ebp), %eax #, tmp64
- movl %eax, (%esp) # tmp64,
- call std::remove_reference<std::bitset<16777215u>&>::type&& std::move<std::bitset<16777215u>&>(std::bitset<16777215u>&) #
- movl %eax, %edx #, D.21547
- movl 8(%ebp), %eax # .result_ptr, tmp65
- movl $2097152, %ecx #, tmp68
- movl %ecx, 8(%esp) # tmp68,
- movl %edx, 4(%esp) # tmp67,
- movl %eax, (%esp) # tmp66,
- call memcpy #
movl 8(%ebp), %eax # .result_ptr,
leave
.cfi_restore 5
(The lines marked + only exist in the return by value case, the lines with - only exist in the move case).
There's a whole lot more stack pointer manipulation going on in the move case (and some very big numbers at that). Crucially that then ends up with a memcpy call that copies the results back onto the stack.
My analysis of it is that for the return by value case there's actually another optimisation happening, which means that the unused temporary inside main is getting omitted entirely for the return by value case, but not the move case.
We can confirm that further by performing the same analysis on the return by value case with -O0 disabling all optimsisations and seeing what happens:
diff -Nru noopt.s ret.s
--- noopt.s 2015-05-21 14:06:14.798028762 +0100
+++ ret.s 2015-05-21 14:00:40.021510019 +0100
@@ -3,7 +3,7 @@
# compiled by GNU C version 4.6.4, GMP version 5.1.3, MPFR version 3.1.2-p3, MPC version 1.0.1
# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed: -imultilib . -imultiarch i386-linux-gnu -D_GNU_SOURCE
-# test.cc -mtune=generic -march=i686 -O0 -std=c++0x -fverbose-asm
+# test.cc -mtune=generic -march=i686 -std=c++0x -fverbose-asm
# -fstack-protector
# options enabled: -fasynchronous-unwind-tables -fauto-inc-dec
# -fbranch-count-reg -fcommon -fdelete-null-pointer-checks -fdwarf2-cfi-asm
@@ -79,23 +79,13 @@
.cfi_offset 5, -8
movl %esp, %ebp #,
.cfi_def_cfa_register 5
- subl $2097176, %esp #,
- leal -2097160(%ebp), %eax #, tmp60
+ subl $24, %esp #,
+ movl 8(%ebp), %eax # .result_ptr, tmp59
movl $2097152, %edx #, tmp61
movl %edx, 8(%esp) # tmp61,
movl $0, 4(%esp) #,
movl %eax, (%esp) # tmp60,
call memset #
- leal -2097160(%ebp), %eax #, tmp64
- movl %eax, (%esp) # tmp64,
- call _ZSt4moveIRSt6bitsetILj16777215EEEONSt16remove_referenceIT_E4typeEOS4_ #
- movl %eax, %edx #, D.21547
- movl 8(%ebp), %eax # .result_ptr, tmp65
- movl $2097152, %ecx #, tmp68
- movl %ecx, 8(%esp) # tmp68,
- movl %edx, 4(%esp) # tmp67,
- movl %eax, (%esp) # tmp66,
- call memcpy #
movl 8(%ebp), %eax # .result_ptr,
leave
.cfi_restore 5
Again there's the same stack pointer manipluation and copy happening with optimisations disabled in the return by value case. So it looks like you've got a stack overflow in both cases, but in the return by value case your test case isn't sufficient to actually observe it because of other optimsations.
Solution: allocate on the heap, or get a bigger stack using pthread_attr_setstacksize
or clone
on Linux.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With