Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ pointer weird undefined behaviour

Compiling with -O2 (or -O3 for that matter) and running this program yields interesting results on my machine.

#include <iostream>

using namespace std;

int main()
{
    // Pointer to an int in the heap with a value of 5
    int *p = new int(5);
    // Deallocate the memory, but keep a dangling pointer
    delete p;
    // Write 123 to deallocated space
    *p = 123;
    // Allocate a long int in the heap
    long *x = new long(456);

    // Print values and pointers
    cout << "*p: " << *p << endl;
    cout << "*x: " << *x << endl;
    cout << "p:  " << p << endl;
    cout << "x:  " << x << endl;

    cout << endl << "Changing nothing" << endl << endl;

    // Print again without changing anything
    cout << "*p: " << *p << endl;
    cout << "*x: " << *x << endl;
    cout << "p:  " << p << endl;
    cout << "x:  " << x << endl;

    return 0;
}

g++ -O2 code.cc; ./a.out

*p: 123
*x: 456
p:  0x112f010
x:  0x112f010

Changing nothing

*p: 456
*x: 456
p:  0x112f010
x:  0x112f010

What I am doing is writing to a deallocated int in the heap pointed to by p and then allocating a long with address x. My compiler consistently places the long on the same address as p -> x == p. Now when I dereference p and print it, it retains a value of 123, even though it has been rewritten with the long 456. *x is then printed as 456. What is even weirder, is that later, without changing anything, printing the same values yields the expected results. I thought this was an optimization technique which only initializes *x when it is used after printing the value *p, which would explain it. However, an objdump says something else. Here is a truncated and commented objdump -d a.out:

00000000004008a0 <main>:
  4008a0:   41 54                   push   %r12
  4008a2:   55                      push   %rbp

Most likely the int allocation, where 0x4 is the size (4 bytes)
  4008a3:   bf 04 00 00 00          mov    $0x4,%edi
  4008a8:   53                      push   %rbx
  4008a9:   e8 e2 ff ff ff          callq  400890 <_Znwm@plt>

I have no idea what is going on here, but the pointer p is in 2 registers. Let's call the other one q.
q = p;
  4008ae:   48 89 c3                mov    %rax,%rbx

  4008b1:   48 89 c7                mov    %rax,%rdi

*p = 5;
  4008b4:   c7 00 05 00 00 00       movl   $0x5,(%rax)

delete p;
  4008ba:   e8 51 ff ff ff          callq  400810 <_ZdlPv@plt>

*q = 123;
  4008bf:   c7 03 7b 00 00 00       movl   $0x7b,(%rbx)

The long allocation and some other stuff (?). (8 bytes)
  4008c5:   bf 08 00 00 00          mov    $0x8,%edi
  4008ca:   e8 c1 ff ff ff          callq  400890 <_Znwm@plt>
  4008cf:   44 8b 23                mov    (%rbx),%r12d
  4008d2:   be e4 0b 40 00          mov    $0x400be4,%esi
  4008d7:   bf c0 12 60 00          mov    $0x6012c0,%edi

Initialization of the long before the printing
*p = 456;
  4008dc:   48 c7 00 c8 01 00 00    movq   $0x1c8,(%rax)

  4008e3:   48 89 c5                mov    %rax,%rbp

The printing
  4008e6:   e8 85 ff ff ff          callq  400870 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
........

Now, although *p has been overwritten by the long initialization (4008dc), it is still printed as 123.

I hope I made any sense here, and thank you for any help.

to make myself clear: I am trying to figure out what is going on behind the scenes, what the compiler does, and why the resulting compiled code does not correspond with the output. I KNOW THIS IS UNDEFINED BEHAVIOUR AND THAT ANYTHING CAN HAPPEN. But that means that the compiler can produce any code and not that the CPU is going to make up instructions. Any ideas are welcome.

PS: Don't worry, I am not planning to use this anywhere ;)

EDIT: On my friend's machine (OS X) it yields the expected results even with optimization.

like image 829
sammko Avatar asked Apr 15 '26 09:04

sammko


1 Answers

You stopped looking at your disassembly output too soon (or at least you didn't post the next few lines, which are relevant to your question). They probably look something like:

movl    %r12d, %esi
movq    %rax, %rdi
call    _ZNSolsEi
movq    %rax, %rdi
call    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_

rbx and r12 are registers that must be preserved across function calls in the x64 ABI used by GCC on Linux. After the allocation of the long, you see this instruction:

mov    (%rbx),%r12d

The uses of rbx earlier in the instruction stream include:

mov    %rax,%rbx       ; store the `p` pointer in `rbx`

...

movl   $0x7b,(%rbx)    ; store 123 where `p` pointed (even though it has been freed before)

... 

mov    (%rbx),%r12d    ; read that value - 123 - back and into `r12`

then you see in the snippet I posted above, which is the disassemble that didn't make it into your question and corresponds to part of the cout << "*p: " << *p << endl statement:

movl    %r12d, %esi    ; put 123 into `esi`, which is used to pass an argument to a function call

And 123 gets printed.

like image 156
Michael Burr Avatar answered Apr 16 '26 23:04

Michael Burr



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!