Compiling with -O2 (or -O3 for that matter) and running this program yields interesting results on my machine.
#include <iostream>
using namespace std;
int main()
{
// Pointer to an int in the heap with a value of 5
int *p = new int(5);
// Deallocate the memory, but keep a dangling pointer
delete p;
// Write 123 to deallocated space
*p = 123;
// Allocate a long int in the heap
long *x = new long(456);
// Print values and pointers
cout << "*p: " << *p << endl;
cout << "*x: " << *x << endl;
cout << "p: " << p << endl;
cout << "x: " << x << endl;
cout << endl << "Changing nothing" << endl << endl;
// Print again without changing anything
cout << "*p: " << *p << endl;
cout << "*x: " << *x << endl;
cout << "p: " << p << endl;
cout << "x: " << x << endl;
return 0;
}
g++ -O2 code.cc; ./a.out
*p: 123
*x: 456
p: 0x112f010
x: 0x112f010
Changing nothing
*p: 456
*x: 456
p: 0x112f010
x: 0x112f010
What I am doing is writing to a deallocated int in the heap pointed to by p and then allocating a long with address x. My compiler consistently places the long on the same address as p -> x == p.
Now when I dereference p and print it, it retains a value of 123, even though it has been rewritten with the long 456. *x is then printed as 456. What is even weirder, is that later, without changing anything, printing the same values yields the expected results. I thought this was an optimization technique which only initializes *x when it is used after printing the value *p, which would explain it. However, an objdump says something else. Here is a truncated and commented objdump -d a.out:
00000000004008a0 <main>:
4008a0: 41 54 push %r12
4008a2: 55 push %rbp
Most likely the int allocation, where 0x4 is the size (4 bytes)
4008a3: bf 04 00 00 00 mov $0x4,%edi
4008a8: 53 push %rbx
4008a9: e8 e2 ff ff ff callq 400890 <_Znwm@plt>
I have no idea what is going on here, but the pointer p is in 2 registers. Let's call the other one q.
q = p;
4008ae: 48 89 c3 mov %rax,%rbx
4008b1: 48 89 c7 mov %rax,%rdi
*p = 5;
4008b4: c7 00 05 00 00 00 movl $0x5,(%rax)
delete p;
4008ba: e8 51 ff ff ff callq 400810 <_ZdlPv@plt>
*q = 123;
4008bf: c7 03 7b 00 00 00 movl $0x7b,(%rbx)
The long allocation and some other stuff (?). (8 bytes)
4008c5: bf 08 00 00 00 mov $0x8,%edi
4008ca: e8 c1 ff ff ff callq 400890 <_Znwm@plt>
4008cf: 44 8b 23 mov (%rbx),%r12d
4008d2: be e4 0b 40 00 mov $0x400be4,%esi
4008d7: bf c0 12 60 00 mov $0x6012c0,%edi
Initialization of the long before the printing
*p = 456;
4008dc: 48 c7 00 c8 01 00 00 movq $0x1c8,(%rax)
4008e3: 48 89 c5 mov %rax,%rbp
The printing
4008e6: e8 85 ff ff ff callq 400870 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
........
Now, although *p has been overwritten by the long initialization (4008dc), it is still printed as 123.
I hope I made any sense here, and thank you for any help.
to make myself clear: I am trying to figure out what is going on behind the scenes, what the compiler does, and why the resulting compiled code does not correspond with the output. I KNOW THIS IS UNDEFINED BEHAVIOUR AND THAT ANYTHING CAN HAPPEN. But that means that the compiler can produce any code and not that the CPU is going to make up instructions. Any ideas are welcome.
PS: Don't worry, I am not planning to use this anywhere ;)
EDIT: On my friend's machine (OS X) it yields the expected results even with optimization.
You stopped looking at your disassembly output too soon (or at least you didn't post the next few lines, which are relevant to your question). They probably look something like:
movl %r12d, %esi
movq %rax, %rdi
call _ZNSolsEi
movq %rax, %rdi
call _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
rbx and r12 are registers that must be preserved across function calls in the x64 ABI used by GCC on Linux. After the allocation of the long, you see this instruction:
mov (%rbx),%r12d
The uses of rbx earlier in the instruction stream include:
mov %rax,%rbx ; store the `p` pointer in `rbx`
...
movl $0x7b,(%rbx) ; store 123 where `p` pointed (even though it has been freed before)
...
mov (%rbx),%r12d ; read that value - 123 - back and into `r12`
then you see in the snippet I posted above, which is the disassemble that didn't make it into your question and corresponds to part of the cout << "*p: " << *p << endl statement:
movl %r12d, %esi ; put 123 into `esi`, which is used to pass an argument to a function call
And 123 gets printed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With