Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In C++, why do some compilers refuse to put objects consisting of only a double into a register?

In section 20 of Scott Meyer's Effective C++, he states:

some compilers refuse to put objects consisting of only a double into a register

When passing built-in types by value, compilers will happily place the data in registers and quickly send ints/doubles/floats/etc. along. However, not all compilers will treat small objects with the same grace. I can easily understand why compilers would treat Objects differently - to pass an Object by value can be a lot more work than copying data members between the vtable and all the constructors.

But still. This seems like an easy problem for modern compilers to solve: "This class is small, maybe I can treat it differently". Meyer's statement seemed to imply that compilers WOULD make this optimization for objects consisting of only an int (or char or short).

Can someone give further insight as to why this optimization sometimes doesn't happen?

like image 715
Ari Sweedler Avatar asked Aug 30 '18 23:08

Ari Sweedler


2 Answers

I found this document online on "Calling conventions for different C++ compilers and operating systems" (updated on 2018-04-25) which has a table depicting "Methods for passing structure, class and union objects".

From the table you can see that if an object contains long double, copy of entire object is transferred to stack for all compilers shown here.

enter image description here

Also from the same resource (with emphasis added):

There are several different methods to transfer a parameter to a function if the parameter is a structure, class or union object. A copy of the object is always made, and this copy is transferred to the called function either in registers, on the stack, or by a pointer, as specified in table 6. The symbols in the table specify which method to use. S takes precedence over I and R. PI and PS take precedence over all other passing methods.

As table 6 tells, an object cannot be transferred in registers if it is too big or too complex. For example, an object that has a copy constructor cannot be transferred in registers because the copy constructor needs an address of the object. The copy constructor is called by the caller, not the callee.

Objects passed on the stack are aligned by the stack word size, even if higher alignment would be desired. Objects passed by pointers are not aligned by any of the compilers studied, even if alignment is explicitly requested. The 64bit Windows ABI requires that objects passed by pointers be aligned by 16.

An array is not treated as an object but as a pointer, and no copy of the array is made, except if the array is wrapped into a structure, class or union.

The 64 bit compilers for Linux differ from the ABI (version 0.97) in the following respects: Objects with inheritance, member functions, or constructors can be passed in registers. Objects with copy constructor, destructor or virtual are passed by pointers rather than on the stack.

The Intel compilers for Windows are compatible with Microsoft. Intel compilers for Linux are compatible with Gnu.

like image 190
P.W Avatar answered Oct 06 '22 17:10

P.W


Here is an example showing that LLVM clang with optimization level O3 treats a class with a single double data member just like it was a double:

$ cat main.cpp
#include <stdio.h>
class MyDouble {
public:
    double d;
    MyDouble(double _d):d(_d){}
};
void foo(MyDouble d)
{
    printf("%lg\n",d.d);
}
int main(int argc, char **argv)
{
    if (argc>5)
    {
        double x=(double)argc;
        MyDouble d(x);
        foo(d);
    }
    return 0;
}

When I compile it and view the generated bitcode file, I see that foo behaves as if it operates on a double type input parameter:

$ clang++ -O3 -c -emit-llvm main.cpp
$ llvm-dis main.bc

Here is the relevant part:

; Function Attrs: nounwind uwtable
define void @_Z3foo8MyDouble(double %d.coerce) #0 {
entry:
  %call = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([5 x i8]* @.str, i64 0, i64 0), double %d.coerce)
  ret void
}

See how foo declares its input parameter as double, and moves it around for printing ``as is". Now let's compile the exact same code with O0:

$ clang++ -O0 -c -emit-llvm main.cpp
$ llvm-dis main.bc

When we look at the relevant part, we see that clang uses a getelementptr instruction to access its first (and only) data member d:

; Function Attrs: uwtable
define void @_Z3foo8MyDouble(double %d.coerce) #0 {
entry:
  %d = alloca %class.MyDouble, align 8
  %coerce.dive = getelementptr %class.MyDouble* %d, i32 0, i32 0
  store double %d.coerce, double* %coerce.dive, align 1
  %d1 = getelementptr inbounds %class.MyDouble* %d, i32 0, i32 0
  %0 = load double* %d1, align 8
  %call = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %0)
  ret void
}
like image 21
OrenIshShalom Avatar answered Oct 06 '22 17:10

OrenIshShalom