Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to achieve "optimal" operator overload-resolution in arithmetic expressions with rvalues?

first of all, I apologize for the overly verbose question. I couldn't think of any other way to accurately summarize my problem... Now on to the actual question:

I'm currently experimenting with C++0x rvalue references... The following code produces unwanted behavior:

#include <iostream>
#include <utility>

struct Vector4
{
    float x, y, z, w;

    inline Vector4 operator + (const Vector4& other) const
    {
        Vector4 r;
        std::cout << "constructing new temporary to store result"
                  << std::endl;
        r.x = x + other.x;
        r.y = y + other.y;
        r.z = z + other.z;
        r.w = w + other.w;
        return r;
    }
    Vector4&& operator + (Vector4&& other) const
    {
        std::cout << "reusing temporary 2nd operand to store result"
                  << std::endl;
        other.x += x;
        other.y += y;
        other.z += z;
        other.w += w;
        return std::move(other);
    }
    friend inline Vector4&& operator + (Vector4&& v1, const Vector4& v2)
    {
        std::cout << "reusing temporary 1st operand to store result"
                  << std::endl;
        v1.x += v2.x;
        v1.y += v2.y;
        v1.z += v2.z;
        v1.w += v2.w;
        return std::move(v1);
    }
};

int main (void)
{
    Vector4 r,
            v1 = {1.0f, 1.0f, 1.0f, 1.0f},
            v2 = {2.0f, 2.0f, 2.0f, 2.0f},
            v3 = {3.0f, 3.0f, 3.0f, 3.0f},
            v4 = {4.0f, 4.0f, 4.0f, 4.0f},
            v5 = {5.0f, 5.0f, 5.0f, 5.0f};

    ///////////////////////////
    // RELEVANT LINE HERE!!! //
    ///////////////////////////
    r = v1 + v2 + (v3 + v4) + v5;

    return 0;
}

results in the output

constructing new temporary to store result
constructing new temporary to store result
reusing temporary 1st operand to store result
reusing temporary 1st operand to store result

while I had hoped for something like

constructing new temporary to store result
reusing temporary 1st operand to store result
reusing temporary 2nd operand to store result
reusing temporary 2nd operand to store result

After trying to re-enact what the compiler was doing (I'm using MinGW G++ 4.5.2 with option -std=c++0x in case it matters), it actually seems quite logical. The standard says that arithmetic operations of equal precedence are evaluated/grouped left-to-right (why I assumed right-to-left I don't know, I guess it's more intuitive to me). So what happened here is that the compiler evaluated the sub-expression (v3 + v4) first (since it's in parentheses?), and then began matching the operations in the expression left-to-right against the operator overloads, resulting in a call to Vector4 operator + (const Vector4& other) for the sub-expression v1 + v2. If I want to avoid the unnecessary temporary, I'd have to make sure that no more than one lvalue operand appears to the immediate left of any parenthesized sub-expression, which is counter-intuitive to anyone using this "library" and innocently expecting optimal performance (as in minimizing the creation of temporaries).

(I'm aware that there's ambiguity in my code regarding operator + (Vector4&& v1, const Vector4& v2) and operator + (Vector4&& other) when (v3 + v4) is to be added to the result of v1 + v2, resulting in a warning. But it's harmless in my case and I don't want to add yet another overload for two rvalue reference operands - anyone know if there's a way to disable this warning in gcc?)

Long story short, my question boils down to: Is there any way or pattern (preferably compiler-independent) this vector class could be rewritten to enable arbitrary use of parentheses in expressions that still results in the "optimal" choice of operator overloads (optimal in terms of "performance", i.e. maximizing the binding to rvalue references)? Perhaps I'm asking for too much though and it's impossible... if so, then that's fine too. I just want to make sure I'm not missing anything.

Many thanks in advance

Addendum

First thanks to the quick responses I got, within minutes (!) - I really should have started posting here sooner...

It's becoming tedious replying in the comments, so I think a clarification of my intent with this class design is in order. Maybe you can point me to a fundamental conceptual flaw in my thought process if there is one.

You may notice that I don't hold any resources in the class like heap memory. Its members are only scalar types even. At first sight this makes it a suspect candidate for move-semantics based optimizations (see also this question that actually helped me a great deal grasping the concepts behind rvalue references).

However, since the classes this one is supposed to be a prototype for will be used in a performance-critical context (a 3D engine to be precise), I want to optimize every little thing possible. Low-complexity algorithms and maths-related techniques like look-up tables should of course make up the bulk of the optimizations as anything else would simply be addressing the symptoms and not eradicating the real reason for bad performance. I am well aware of that.

With that out of the way, my intent here is to optimize algebraic expressions with vectors and matrices that are essentially plain-old-data structs without pointers to data in them (mainly due to the performance drawbacks you get with data on the heap [having to dereference additional pointers, cache considerations etc.]).

I don't care about move-assignment or construction, I just don't want more temporaries being created during the evaluation of a complicated algebraic expression than absolutely necessary (usually just one or two, e.g. a matrix and a vector).

Those are my thoughts that might be erroneous. If they are, please correct me:

  1. To achieve this without relying on RVO, return-by-reference is necessary (again: keep in mind I don't have remote resources, only scalar data members).
  2. Returning by reference makes the function-call expression an lvalue, implying the returned object is not a temporary, which is bad, but returning by rvalue reference makes the function-call expression an xvalue (see 3.10.1), which is okay in the context of my approach (see 4)
  3. Returning by reference is dangerous, because of the possibly short lifetime of objects, but:
  4. temporaries are guaranteed to live until the end of the evaluation of the expression they were created in, therefore:
  5. making it safe to return by reference from those operators that take at least one rvalue-reference as their argument, if the object referenced by this rvalue reference argument is the one being returned by reference. Therefore:
  6. Any arbitrary expression that only employs binary operators can be evaluated by creating only one temporary when not more than one PoD-like type is involved, and the binary operations don't require a temporary by nature (like matrix multiplication)

(Another reason to return by rvalue-reference is because it behaves like returning by value in terms of rvalue-ness of the function-call expression; and it's required for the operator/function-call expression to be an rvalue in order to bind to subsequent calls to operators that take rvalue references. As stated in (2), calls to functions that return by reference are lvalues, and would therefore bind to operators with the signature T operator+(const T&, const T&), resulting in the creation of an unnecessary temporary)

I could achieve the desired performance by using a C-style approach of functions like add(Vector4 *result, Vector4 *v1, Vector4 *v2), but come on, we're living in the 21st century...

In summary, my goal is creating a vector class that achieves the same performance as the C-approach using overloaded operators. If that in itself is impossible, than I guess it can't be helped. But I'd appreciate if someone could explain to me why my approach is doomed to fail (the left-to-right operator evaluation issue that was the initial reason for my post aside, of course).
As a matter of fact, I've been using the "real" vector class this one is a simplification of for a while without any crashes or corrupted memory so far. And in fact, I never actually return local objects as references, so there shouldn't be any problems. I dare say what I'm doing is standard-compliant.

Any help on the original issue would of course be appreciated as well!

many thanks for all the patience again

like image 394
Awaki Avatar asked Nov 05 '22 02:11

Awaki


2 Answers

You should not return an rvalue reference, you should return a value. In addition, you should not specify both a member and a free operator+. I'm amazed that even compiled.

Edit:

r = v1 + v2 + (v3 + v4) + v5;

How could you possibly only have one temporary value when you're performing two sub-computations? That's just impossible. You can't re-write the Standard and change this.

You will just have to trust your users to do something not completely stupid, like write the above line of code, and expect to have just one temporary.

like image 119
Puppy Avatar answered Nov 16 '22 17:11

Puppy


I recommend modeling your code after the basic_string operator+() found in chapter 21 of N3225.

like image 43
Howard Hinnant Avatar answered Nov 16 '22 17:11

Howard Hinnant