Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

About returning more than one value in C/C++/Assembly

I have read some questions about returning more than one value such as What is the reason behind having only one return value in C++ and Java?, Returning multiple values from a C++ function and Why do most programming languages only support returning a single value from a function?.

I agree with most of the arguments used to prove that more than one return value is not strictly necessary and I understand why such feature hasn't been implemented, but I still can't understand why can't we use multiple caller-saved registers such as ECX and EDX to return such values.

Wouldn't it be faster to use the registers instead of creating a Class/Struct to store those values or passing arguments by reference/pointers, both of which use memory to store them? If it is possible to do such thing, does any C/C++ compiler use this feature to speed up the code?

Edit:

An ideal code would be like this:

(int, int) getTwoValues(void) { return 1, 2; }

int main(int argc, char** argv)
{
    // a and b are actually returned in registers
    // so future operations with a and b are faster
    (int a, int b) = getTwoValues();
    // do something with a and b
    
    return 0;
}
like image 435
Nighteen Avatar asked Dec 14 '22 13:12

Nighteen


2 Answers

Yes, this is sometimes done. If you read the Wikipedia page on x86 calling conventions under cdecl:

There are some variations in the interpretation of cdecl, particularly in how to return values. As a result, x86 programs compiled for different operating system platforms and/or by different compilers can be incompatible, even if they both use the "cdecl" convention and do not call out to the underlying environment. Some compilers return simple data structures with a length of 2 registers or less in the register pair EAX:EDX, and larger structures and class objects requiring special treatment by the exception handler (e.g., a defined constructor, destructor, or assignment) are returned in memory. To pass "in memory", the caller allocates memory and passes a pointer to it as a hidden first parameter; the callee populates the memory and returns the pointer, popping the hidden pointer when returning.

(emphasis mine)

Ultimately, it comes down to calling convention. It's possible for your compiler to optimize your code to use whatever registers it wants, but when your code interacts with other code (like the operating system), it needs to follow the standard calling conventions, which typically uses 1 register for returning values.

like image 135
Cornstalks Avatar answered Dec 28 '22 02:12

Cornstalks


Returning in stack isn't necessarily slower, because once the values are available in L1 cache (which the stack often fulfills), accessing them will be very fast.

However in most computer architectures there are at least 2 registers to return values that are twice (or more) as wide as the word size (edx:eax in x86, rdx:rax in x86_64, $v0 and $v1 in MIPS (Why MIPS assembler has more that one register for return value?), R0:R3 in ARM1, X0:X7 in ARM64...). The ones that don't have are mostly microcontrollers with only one accumulator or a very limited number of registers.

1"If the type of value returned is too large to fit in r0 to r3, or whose size cannot be determined statically at compile time, then the caller must allocate space for that value at run time, and pass a pointer to that space in r0."

These registers can also be used for returning directly small structs that fits in 2 (or more depending on architecture and ABI) registers or less.

For example with the following code

struct Point
{
    int x, y;
};

struct shortPoint
{
    short x, y;
};

struct Point3D
{
    int x, y, z;
};

Point P1()
{
    Point p;
    p.x = 1;
    p.y = 2;
    return p;
}

Point P2()
{
    Point p;
    p.x = 1;
    p.y = 0;
    return p;
}

shortPoint P3()
{
    shortPoint p;
    p.x = 1;
    p.y = 0;
    return p;
}

Point3D P4()
{
    Point3D p;
    p.x = 1;
    p.y = 2;
    p.z = 3;
    return p;
}

Clang emits the following instructions for x86_64 as you can see here

P1():                                 # @P1()
    movabs  rax, 8589934593
    ret

P2():                                 # @P2()
    mov eax, 1
    ret

P3():                                 # @P3()
    mov eax, 1
    ret

P4():                                 # @P4()
    movabs  rax, 8589934593
    mov edx, 3
    ret

For ARM64:

P1():
    mov x0, 1
    orr x0, x0, 8589934592
    ret
P2():
    mov x0, 1
    ret
P3():
    mov w0, 1
    ret
P4():
    mov x1, 1
    mov x0, 0
    sub sp, sp, #16
    bfi x0, x1, 0, 32
    mov x1, 2
    bfi x0, x1, 32, 32
    add sp, sp, 16
    mov x1, 3
    ret

As you can see, no stack operations are involved. You can switch to other compilers to see that the values are mainly returned on registers.

like image 41
phuclv Avatar answered Dec 28 '22 02:12

phuclv