How to allow copy elision construction for C++ classes (not just POD C structs)

Tags:

Consider the following code:

#include <iostream> #include <type_traits>  struct A {   A() {}   A(const A&) { std::cout << "Copy" << std::endl; }   A(A&&) { std::cout << "Move" << std::endl; } };  template <class T> struct B {   T x; };  #define MAKE_B(x) B<decltype(x)>{ x }  template <class T> B<T> make_b(T&& x) {   return B<T> { std::forward<T>(x) }; }  int main() {   std::cout << "Macro make b" << std::endl;   auto b1 = MAKE_B( A() );   std::cout << "Non-macro make b" << std::endl;   auto b2 = make_b( A() ); }

This outputs the following:

Macro make b
Non-macro make b
Move

Note that b1 is constructed without a move, but the construction of b2 requires a move.

I also need to type deduction, as A in real life usage may be a complex type which is difficult to write explicitly. I also need to be able to nest calls (i.e. make_c(make_b(A()))).

Is such a function possible?

Further thoughts:

N3290 Final C++0x draft page 284:

This elision of copy/move operations, called copy elision, is permitted in the following circumstances:

when a temporary class object that has not been bound to a reference (12.2) would be copied/moved to a class object with the same cv-unqualified type, the copy/move operation can be omitted by constructing the temporary object directly into the target of the omitted copy/move

Unfortunately this seems that we can't elide copies (and moves) of function parameters to function results (including constructors) as those temporaries are either bound to a reference (when passed by reference) or no longer temporaries (when passed by value). It seems the only way to elide all copies when creating a composite object is to create it as an aggregate. However, aggregates have certain restrictions, such as requiring all members be public, and no user defined constructors.

I don't think it makes sense for C++ to allow optimizations for POD C-structs aggregate construction but not allow the same optimizations for non-POD C++ class construction.

Is there any way to allow copy/move elision for non-aggregate construction?

My answer:

This construct allows for copies to be elided for non-POD types. I got this idea from David Rodríguez's answer below. It requires C++11 lambdas. In this example below I've changed make_b to take two arguments to make things less trivial. There are no calls to any move or copy constructors.

#include <iostream> #include <type_traits>  struct A {   A() {}   A(const A&) { std::cout << "Copy" << std::endl; }   A(A&&) { std::cout << "Move" << std::endl; } };  template <class T> class B { public:   template <class LAMBDA1, class LAMBDA2>   B(const LAMBDA1& f1, const LAMBDA2& f2) : x1(f1()), x2(f2())    {      std::cout      << "I'm a non-trivial, therefore not a POD.\n"      << "I also have private data members, so definitely not a POD!\n";   } private:   T x1;   T x2; };  #define DELAY(x) [&]{ return x; }  #define MAKE_B(x1, x2) make_b(DELAY(x1), DELAY(x2))  template <class LAMBDA1, class LAMBDA2> auto make_b(const LAMBDA1& f1, const LAMBDA2& f2) -> B<decltype(f1())> {   return B<decltype(f1())>( f1, f2 ); }  int main() {   auto b1 = MAKE_B( A(), A() ); }

If anyone knows how to achieve this more neatly I'd be quite interested to see it.

Previous discussion:

This somewhat follows on from the answers to the following questions:

Can creation of composite objects from temporaries be optimised away?
Avoiding need for #define with expression templates
Eliminating unnecessary copies when building composite objects

247

asked May 04 '11 02:05

Clinton

1 Answers

As Anthony has already mentioned, the standard forbids copy elision from the argument of a function to the return of the same function. The rationale that drives that decision is that copy elision (and move elision) is an optimization by which two objects in the program are merged into the same memory location, that is, the copy is elided by having both objects be one. The (partial) standard quote is below, followed by a set of circumstances under which copy elision is allowed, which do not include that particular case.

So what makes that particular case different? The difference is basically that the fact that there is a function call between the original and the copied objects, and the function call implies that there are extra constraints to consider, in particular the calling convention.

Given a function T foo( T ), and a user calling T x = foo( T(param) );, in the general case, with separate compilation, the compiler will create an object $tmp1 in the location that the calling convention requires the first argument to be. It will then call the function and initialize x from the return statement. Here is the first opportunity for copy elision: by carefully placing x on the location where the returned temporary is, x and the returned object from foo become a single object, and that copy is elided. So far so good. The problem is that the calling convention in general will not have the returned object and the parameter in the same location, and because of that, $tmp1 and x cannot be a single location in memory.

Without seeing the function definition the compiler cannot possibly know that the only purpose of the argument to the function is to serve as return statement, and as such it cannot elide that extra copy. It can be argued that if the function is inline then the compiler would have the missing extra information to understand that the temporary used to call the function, the returned value and x are a single object. The problem is that that particular copy can only be elided if the code is actually inlined (not only if it is marked as inline but actually inlined) If a function call is required, then the copy cannot be elided. If the standard allowed that copy to be elided when the code is inlined, it would imply that the behavior of a program would differ due to the compiler and not user code --the inline keyword does not force inlining, it only means that multiple definitions of the same function do not represent a violation of the ODR.

Note that if the variable was created inside the function (as compared to passed into it) as in: T foo() { T tmp; ...; return tmp; } T x = foo(); then both copies can be elided: There is no restriction as of where tmp has to be created (it is not an input or output parameter to the function so the compiler is able to relocate it anywhere, including the location of the returned type, and on the calling side, x can as in the previous example be carefully located in the location of that same return statement, which basically means that tmp, the return statement and x can be a single object.

As of your particular problem, if you resort to a macro, the code is inlined, there are no restrictions on the objects and the copy can be elided. But if you add a function, you cannot elide the copy from the argument to the return statement. So just avoid it. Instead of using a template that will move the object, create a template that will construct an object:

template <typename T, typename... Args> T create( Args... x ) {    return T( x... ); }

And that copy can be elided by the compiler.

Note that I have not dealt with move construction, as you seem concerned on the cost of even move construction, even though I believe that you are barking at the wrong tree. Given a motivating real use case, I am quite sure that people here will come up with a couple of efficient ideas.

12.8/31

When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, even if the copy/move constructor and/or destructor for the object have side effects. In such cases, the implementation treats the source and target of the omitted copy/move operation as simply two different ways of referring to the same object, and the destruction of that object occurs at the later of the times when the two objects would have been destroyed without the optimization.

149

answered Oct 23 '22 00:10

David Rodríguez - dribeas

Related questions
                            
                                Can clang-format align a block of #defines for me?
                            
                                How do the C++ STL (ExecutionPolicy) algorithms determine how many parallel threads to use?
                            
                                Building both DLL and static libs from the same project
                            
                                Boost Fusion articles, examples, tutorials?
                            
                                Integration of Python console into a GUI C++ application
                            
                                Does a multiple producer single consumer lock-free queue exist for c++? [closed]
                            
                                Why doesn't GCC and Clang do this aliasing-optimization?
                            
                                std::move of string literal - which compiler is correct?
                            
                                Guaranteed memory layout for standard layout struct with a single array member of primitive type
                            
                                C++ Move semantics and Exceptions
                            
                                Do C++11 regular expressions work with UTF-8 strings?
                            
                                C++ Inheritance in Separate Files Using #include and Inclusion Guards
                            
                                What does "Assignable" really mean?
                            
                                Is it illegal to invoke a std::function<void(Args...)> under the standard?
                            
                                Different compiler behavior when applying a const qualifier to a template argument
                            
                                Using alias templates for sfinae: does the language allow it?
                            
                                std::istream_iterator<> with copy_n() and friends
                            
                                Can adding 'constexpr' change the behaviour?
                            
                                An 'if constexpr branch' does not get discarded inside lambda that is inside a template function
                            
                                How to link a .DLL statically?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to allow copy elision construction for C++ classes (not just POD C structs)

Tags:

c++

optimization

c++11

macros

Clinton

People also ask

1 Answers

David Rodríguez - dribeas

Recent Activity

Donate For Us