Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does returning values from a function work?

Tags:

c++

stack

I recently had a serious bug, where I forgot to return a value in a function. The problem was that even though nothing was returned it worked fine under Linux/Windows and only crashed under Mac. I discovered the bug when I turned on all compiler warnings.

So here is a simple example:

#include <iostream>

class A{
public:
    A(int p1, int p2, int p3): v1(p1), v2(p2), v3(p3)
    {
    }

    int v1;
    int v2;
    int v3;
};

A* getA(){
    A* p = new A(1,2,3);
//  return p;
}

int main(){

    A* a = getA();

    std::cerr << "A: v1=" << a->v1 << " v2=" << a->v2 << " v3=" << a->v3 << std::endl;  

    return 0;
}

My question is how can this work under Linux/Windows without crashing? How is the returning of values done on lower level?

like image 543
D-rk Avatar asked Mar 11 '12 09:03

D-rk


5 Answers

On Intel architecture, simple values (integers and pointers) are usually returned in eax register. This register (among others) is also used as temporary storage when moving values in memory and as operand during calculations. So whatever value left in that register is treated as the return value, and in your case it turned out to be exactly what you wanted to be returned.

like image 173
hamstergene Avatar answered Oct 12 '22 20:10

hamstergene


Probably by luck, 'a' left in a register that happens to be used for returning single pointer results, something like that.

The calling/ conventions and function result returns are architecture-dependent, so it's not surprising that your code works on Windows/Linux but not on a Mac.

like image 42
Martin James Avatar answered Oct 12 '22 21:10

Martin James


There are two major ways for a compiler to return a value:

  1. Put a value in a register before returning, and
  2. Have the caller pass a block of stack memory for the return value, and write the value into that block [more info]

The #1 is usually used with anything that fits into a register; #2 is for everything else (large structs, arrays, et cetera).

In your case, the compiler uses #1 both for the return of new and for the return of your function. On Linux and Windows, the compiler did not perform any value-distorting operations on the register with the returned value between writing it into the pointer variable and returning from your function; on Mac, it did. Hence the difference in the results that you see: in the first case, the left-over value in the return register happened to co-inside with the value that you wanted to return anyway.

like image 34
Sergey Kalinichenko Avatar answered Oct 12 '22 20:10

Sergey Kalinichenko


First off, you need to slightly modify your example to get it to compile. The function must have at least an execution path that returns a value.

A* getA(){
    if(false)
        return NULL;
    A* p = new A(1,2,3);
//  return p;
}

Second, it's obviously undefined behavior, which means anything can happen, but I guess this answer won't satisfy you.

Third, in Windows it works in Debug mode, but if you compile under Release, it doesn't.

The following is compiled under Debug:

    A* p = new A(1,2,3);
00021535  push        0Ch  
00021537  call        operator new (211FEh) 
0002153C  add         esp,4 
0002153F  mov         dword ptr [ebp-0E0h],eax 
00021545  mov         dword ptr [ebp-4],0 
0002154C  cmp         dword ptr [ebp-0E0h],0 
00021553  je          getA+7Eh (2156Eh) 
00021555  push        3    
00021557  push        2    
00021559  push        1    
0002155B  mov         ecx,dword ptr [ebp-0E0h] 
00021561  call        A::A (21271h) 
00021566  mov         dword ptr [ebp-0F4h],eax 
0002156C  jmp         getA+88h (21578h) 
0002156E  mov         dword ptr [ebp-0F4h],0 
00021578  mov         eax,dword ptr [ebp-0F4h] 
0002157E  mov         dword ptr [ebp-0ECh],eax 
00021584  mov         dword ptr [ebp-4],0FFFFFFFFh 
0002158B  mov         ecx,dword ptr [ebp-0ECh] 
00021591  mov         dword ptr [ebp-14h],ecx 

The second instruction, the call to operator new, moves into eax the pointer to the newly created instance.

    A* a = getA();
0010484E  call        getA (1012ADh) 
00104853  mov         dword ptr [a],eax 

The calling context expects eax to contain the returned value, but it does not, it contains the last pointer allocated by new, which is incidentally, p.

So that's why it works.

like image 32
Luchian Grigore Avatar answered Oct 12 '22 21:10

Luchian Grigore


As Kerrek SB mentioned, your code has ventured into the realm of undefined behavior.

Basically, your code is going to compile down to assembly. In assembly, there's no concept of a function requiring a return type, there's just an expectation. I'm the most comfortable with MIPS, so I shall use MIPS to illustrate.

Assume you have the following code:

int add(x, y)
{
    return x + y;
}

This is going to be translated to something like:

add:
    add $v0, $a0, $a1 #add $a0 and $a1 and store it in $v0
    jr $ra #jump back to where ever this code was jumped to from

To add 5 and 4, the code would be called something like:

addi $a0, $0, 5 # 5 is the first param
addi $a1, $0, 4 # 4 is the second param
jal add
# $v0 now contains 9

Note that unlike C, there's no explicit requirement that $v0 contain the return value, just an expectation. So, what happens if you don't actually push anything into $v0? Well, $v0 always has some value, so the value will be whatever it last was.

Note: This post makes some simplifications. Also, you're computer is likely not running MIPS... But hopefully the example holds, and if you learned assembly at a university, MIPS might be what you know anyway.

like image 32
Corbin Avatar answered Oct 12 '22 21:10

Corbin