Consider the following two implementations of a simple Matrix4x4 Identity method.
1: This one takes a Matrix4x4 reference as parameter, in which the data is directly written.
static void CreateIdentity(Matrix4x4& outMatrix) {
for (int i = 0; i < 4; ++i) {
for (int j = 0; j < 4; ++j) {
outMatrix[i][j] = i == j ? 1 : 0;
}
}
}
2: This one returns a Matrix4x4 without taking any input.
static Matrix4x4 CreateIdentity() {
Matrix4x4 outMatrix;
for (int i = 0; i < 4; ++i) {
for (int j = 0; j < 4; ++j) {
outMatrix[i][j] = i == j ? 1 : 0;
}
}
return outMatrix;
}
Now, if I want to actually create an Identity-Matrix I have to do
Matrix4x4 mat;
Matrix4x4::CreateIdentity(mat);
for the first variant and
Matrix4x4 mat = Matrix4x4::CreateIdentity();
for the second.
The first one obviously yields the advantage that not a single unneccesary copy is done, while it does not allow to use it as an rvalue; imagine
Matrix4x4 mat = Matrix4x4::Identity()*Matrix4x4::Translation(5, 7, 6);
Final Question: Is there a way to avoid unneccesary copies when using Methods like Matrix4x4::CreateIdentity();
whenever possible while still allowing to use the method as an rvalue as in my last code-example? Is it even optimised automatically by the compiler? I'm rather confused how to efficiently go about this (seemingly) simple task. Maybe I should implement both versions and use whatever is appropiate?
You mostly don't need to worry about that too much, given that copy elision (in this case, NRVO1) is part of the standard.
In a bit more detail (dangerously), the version returning a matrix will, most likely, end up allocating it on the stack of the calling function and only initializing it in the called function, without any copy constructors being called.
So unless something is inhibiting this (which you can find out by running it and checking if a copy constructor is or isn't called), then you mostly Don't Need to Worry About It.
If copy elision can't happen (or just won't for some reason, for example if the compiler doesn't want to, since it doesn't have to), then you can still make sure to provide a move constructor which would then be used instead2. The good thing here is that it would even work when your return statement involves a conversion to the actual returned type.
References:
If a function returns a class type by value, and the return statement's expression is the name of a non-volatile object with automatic storage duration, which isn't the function parameter, or a catch clause parameter, and which has the same type (ignoring top-level cv-qualification) as the return type of the function, then copy/move is omitted. When that local object is constructed, it is constructed directly in the storage where the function's return value would otherwise be moved or copied to. This variant of copy elision is known as NRVO, "named return value optimization".
If expression is an lvalue expression and the conditions for copy elision are met, or would be met, except that expression names a function parameter, then overload resolution to select the constructor to use for initialization of the returned value is performed twice: first as if expression were an rvalue expression (thus it may select the move constructor or a copy constructor taking reference to const), and if no suitable conversion is available, overload resolution is performed the second time, with lvalue expression (so it may select the copy constructor taking a reference to non-const).
The above rule applies even if the function return type is different from the type of expression (copy elision requires same type).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With