Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why copy constructor called twice in heap array initialization?

For the following C++14 code, why does g++'s generated code for new A[1]{x} seem to invoke the copy constructor twice?

#include <iostream>
using namespace std;

class A {
public:
    A()           { cout << "default ctor" << endl; }
    A(const A& o) { cout << "copy ctor" << endl;    }
    ~A()          { cout << "dtor" << endl;         }
};

int main()
{
    A x;
    cout << "=========" << endl;
    A* y = new A[1]{x};
    cout << "=========" << endl;
    delete[] y;
    return 0;
}

Compilation and output:

$ g++ -fno-elide-constructors -std=c++14 test.cpp && ./a.out
default ctor
=========
copy ctor
copy ctor
dtor
=========
dtor
dtor

Interestingly, for the same code, clang++ only invokes the copy constructor once:

$ clang++ -fno-elide-constructors -std=c++14 test.cpp && ./a.out
default ctor
=========
copy ctor
=========
dtor
dtor

Furthermore, when using g++, changing the A* y = new A[1]{x}; line to any of the following will cause the copy constructor to be called only once:

  • A* y = new A {x}; - normal heap object instead of heap array of size 1
  • A y[1] {x}; - array on stack instead of heap

So it appears that the double copy constructor behavior is only exhibited in heap-array initialization.

like image 599
michaeljan Avatar asked May 15 '21 03:05

michaeljan


People also ask

How many times is copy constructor called?

And there are 4 calls to copy constructor in f function. 1) u is passed by value. 2) v is copy-initialized from u . 3) w is copy-initialized from v .

What are two situations where a copy constructor may be called?

In C++, a Copy Constructor may be called for the following cases: 1) When an object of the class is returned by value. 2) When an object of the class is passed (to a function) by value as an argument. 3) When an object is constructed based on another object of the same class.

Can you have more than one copy constructor?

A class can have multiple copy constructors, e.g. both T::T(const T&) and T::T(T&). If some user-defined copy constructors are present, the user may still force the generation of the implicitly declared copy constructor with the keyword default .

Why do we need copy constructors?

A copy constructor in a Java class is a constructor that creates an object using another object of the same Java class. That's helpful when we want to copy a complex object that has several fields, or when we want to make a deep copy of an existing object.


2 Answers

TL;DR: It's likely a GCC defect, a misinterpretation of {x} as temporary in this context. For each element in new A[N]{x1, x2, ... xN}, the copy constructor should get called once according to [decl.init] and [new.expr]. Instead, GCC likely interprets it as initializer list and thus in part as intermediate rvalue. We can force GCC to interpret it otherwise, though.


why does g++'s generated code for new A[1]{x} seem to invoke the copy constructor twice?

Because there is no move constructor. If we add a move constructor and some more output, we get a better picture of the situation (Compiler Explorer):

#include <iostream>
using namespace std;

class A {
public:
    A()           { cout << "default ctor @" << this << endl; }
    A(A&& o)      { cout << "move ctor: " << &o << " to " << this << endl;    }
    A(const A& o) { cout << "copy ctor: " << &o << " to " << this << endl;    }
    ~A()          { cout << "dtor @" << this << endl;         }
};

int main()
{
    A x;
    cout << "=========" << endl;
    A* y = new A[1]{x};
    cout << "=========" << endl;
    delete[] y;
    return 0;
}

Note that the existence of our new A(A&&) constructor shows us the inbetween temporary:

default ctor @0x7ffec28b5476
=========
copy ctor: 0x7ffec28b5476 to 0x7ffec28b5477
move ctor: 0x7ffec28b5477 to 0x55d0a7fa6288
dtor @0x7ffec28b5477
=========
dtor @0x55d0a7fa6288
dtor @0x7ffec28b5476

Indeed, if we A(A&&) = delete the constructor, g++ won't even compile it anymore (but Clang still accepts it).

It seems like g++ misinterprets the braced-init-list. IMHO, [expr.new] may allow that kind of interpretation, but this seems like a g++ defect and should probably get reported as such.

However, the whole ordeal reminds me of an older question of mine (Are curly braces really required around initialization?). So let's introduce more braces to make sure that g++ cannot misinterpret our initializer:

int main()
{
    A x;
    cout << "=========" << endl;
    A* y = new A[1]{{{x}}};
    cout << "=========" << endl;
    delete[] y;
    return 0;
}

This variant circumvents g++'s behaviour:

initializer for T[1]     start : {
initializer for first element  : {
actual initializer for A       : {x}

The program output is then (Explorer)

default ctor @0x7ffede3d9967
=========
copy ctor: 0x7ffede3d9967 to 0x1eb0ec8
=========
dtor @0x1eb0ec8
dtor @0x7ffede3d9967

So for multiple elements, we end up in brace-hell (Compiler Explorer):

int main()
{
    A x;
    cout << "=========" << endl;
    A* y = new A[2]{{{x},{{x}}};
    cout << "=========" << endl;
    delete[] y;
    return 0;
}

Again, no additional constructors are called:

default ctor @0x7fff3a2a7a27
=========
copy ctor: 0x7fff3a2a7a27 to 0x1f49ec8
copy ctor: 0x7fff3a2a7a27 to 0x1f49ec9
=========
dtor @0x1f49ec9
dtor @0x1f49ec8
dtor @0x7fff3a2a7a27
like image 144
Zeta Avatar answered Sep 19 '22 16:09

Zeta


After doing some research in the standard I came to the conclusion that g++ is wrong and there should be only one copy constructor invocation. What is interesting it seems that there can be two interpretations of which type of initialization occurs here. Both lead to the same conclusion though.

First interpretation - direct initialization

From the C++14 Standard (Working Draft), [expr.new] 17:

A new-expression that creates an object of type T initializes that object as follows:

  • (17.1) — If the new-initializer is omitted, the object is default-initialized (8.5). [ Note: If no initialization is performed, the object has an indeterminate value. — end note ]
  • (17.2) — Otherwise, the new-initializer is interpreted according to the initialization rules of 8.5 for direct initialization.

In our case the new-initializer is present, so (according to 17.2) new A[1]{x} is interpreted using direct initialization rules. Let's look at [dcl.init] 16:

The initialization that occurs in the forms

  • T x(a);
  • T x{a};

as well as in new expressions (5.3.4), static_cast expressions (5.2.9), functional notation type conversions (5.2.3), mem-initializers (12.6.2), and the braced-init-list form of a condition is called direct-initialization

Ok, this further confirms that we are dealing with direct initialization. Now let's see how direct initialization works in [dcl.init] 17:

The semantics of initializers are as follows. The destination type is the type of the object or reference being initialized and the source type is the type of the initializer expression. If the initializer is not a single (possibly parenthesized) expression, the source type is not defined.

  • [... 17.1 through 17.5 omitted ...]
  • (17.6) — If the destination type is a (possibly cv-qualified) class type:
    • (17.6.1) — If the initialization is direct-initialization, or if it is copy-initialization where the cv-unqualified version of the source type is the same class as, or a derived class of, the class of the destination, constructors are considered. The applicable constructors are enumerated (13.3.1.3), and the best one is chosen through overload resolution (13.3). The constructor so selected is called to initialize the object, with the initializer expression or expression-list as its argument(s). If no constructor applies, or the overload resolution is ambiguous, the initialization is ill-formed.

According to the excerpt above, when the object being initialized is a class type (as is the case here) and when dealing with direct initialization (as is the case here) the destination object is initialized using the most suitable constructor.

I won't cite the rules about how the constructor is selected, as in this case when there is only the default A::A() constructor and the copy A::A(const A&) constructor, the copy constructor is obviously the better choice when initializing with x of type A. This is the source of one of the copy constructor invocations.

I didn't find any remarks about the initialization of arrays in particular in section [expr.new] and why it should cause a second constructor invocation.

Second interpretation - copy initialization

Here, we can start from [dcl.init.list] 1:

List-initialization is initialization of an object or reference from a braced-init-list. Such an initializer is called an initializer list, and the comma-separated initializer-clauses of the list are called the elements of the initializer list. An initializer list may be empty. List-initialization can occur in direct-initialization or copy initialization contexts; list-initialization in a direct-initialization context is called direct-list-initialization and list-initialization in a copy-initialization context is called copy-list-initialization. [ Note: List-initialization can be used

  • (1.1) — as the initializer in a variable definition (8.5)
  • (1.2) — as the initializer in a new-expression (5.3.4)
  • [... 1.3 through 1.10 omitted ...]

— end note ]

This excerpt can be understood to say that new A[1]{x} is actually a form of list intialization rather than direct initialization as a braced-init-list {x} is used. Assuming this is the case, let's look at how it works in [dcl.init.list] 3:

List-initialization of an object or reference of type T is defined as follows:

  • [... 3.1 through 3.2 omitted ...]
  • (3.3) — Otherwise, if T is an aggregate, aggregate initialization is performed (8.5.1).
  • [... 3.4 through 3.10 omitted ...]

In our case, point 3.3 applies as we are initializing an array which is an aggregate, according to [dcl.init.aggr] 1:

An aggregate is an array or a class (Clause 9) with no user-provided constructors (12.1), no private or protected non-static data members (Clause 11), no base classes (Clause 10), and no virtual functions (10.3).

As such let's look at how aggregate initialization is performed in [dcl.init.aggr] 2:

When an aggregate is initialized by an initializer list, as specified in 8.5.4, the elements of the initializer list are taken as initializers for the members of the aggregate, in increasing subscript or member order. Each member is copy-initialized from the corresponding initializer-clause. If the initializer-clause is an expression and a narrowing conversion (8.5.4) is required to convert the expression, the program is ill-formed.

This fragment tells us that elements are copy initialized. As such y[0] will be copy initialized from x. Now let's look at how copy initialization works in [dcl.init] 17:

The semantics of initializers are as follows. The destination type is the type of the object or reference being initialized and the source type is the type of the initializer expression. If the initializer is not a single (possibly parenthesized) expression, the source type is not defined.

  • [... 17.1 through 17.5 omitted ...]
  • (17.6) — If the destination type is a (possibly cv-qualified) class type:
    • (17.6.1) — If the initialization is direct-initialization, or if it is copy-initialization where the cv-unqualified version of the source type is the same class as, or a derived class of, the class of the destination, constructors are considered. The applicable constructors are enumerated (13.3.1.3), and the best one is chosen through overload resolution (13.3). The constructor so selected is called to initialize the object, with the initializer expression or expression-list as its argument(s). If no constructor applies, or the overload resolution is ambiguous, the initialization is ill-formed.

Just like last time, this initialization fulfills the requirements for point 17.6.1 as it is copy-initialization where the source type (A of x) is the same as the destination type (A of y[0]). This means that in this case the copy constructor will be called as well.

Conclusion

It seems that regardless of which interpretation is chosen, only one constructor should be called and that Clang is right. I was unable to find any evidence that a temporary should be created. For some more example-based evidence, other compilers like icc, and (admittedly clang-based) zapcc and elcc agree with clang, all having only one copy constructor invocation.

I don't know much about g++'s internal workings, but I have a theory about why it does two copy constructor invocations. It is possible that internally g++ uses some helper constructor invocations that are later always optimized out and that the use of the -fno-elide-constructors flag breaks the invariance that they will be always optimized out. This is however pure speculation about g++ on my side, so please correct me if I'm wrong.

like image 44
janekb04 Avatar answered Sep 22 '22 16:09

janekb04