Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Expression templates: improving performance in evaluating expressions?

By the expression templates technique, a matrix expression like

D = A*B+sin(C)+3.;

is pretty much equivalent, in terms of computing performance, to a hand-written for loop.

Now, suppose that I have the following two expressions

D = A*B+sin(C)+3.;
F = D*E;
cout << F << "\n";

In a "classical" implementation by expression templates, the computing performance will be pretty much the same as that of two for loops in sequence. This is because the expressions are evaluated immediately after the = operators are encountered.

My question is: is there any technique (for example, using placeholders?) to recognize that the values of D are actually unused and that the values of interest are the sole elements of F, so that only the expression

F = E*(A*B+sin(C)+3.);

is evaluated and the whole performance is equivalent to that of a single for loop?

Of course, such an hypothetical technique should also be able to return back to evaluate the expression

D = A*B+sin(C)+3.;

if later in the code the values of D are needed.

Thank you in advance for any help.

EDIT: Results experimenting the solution suggested by Evgeny

Original instruction:

Result D=A*B-sin(C)+3.;

Computing time: 32ms

Two steps instruction:

Result Intermediate=A*B;
Result D=Intermediate-sin(C)+3.;

Computing time: 43ms

Solution with auto:

auto&& Intermediate=A*B;
Result D=Intermediate-sin(C)+3.;

Computing time: 32ms.

In conclusion, auto&& enabled to restore the original computing time of the single instruction case.

EDIT: Summarizing relevant links, following the suggestions by Evgeny

Copy Elision

What does auto tell us

Universal References in C++11

C++ Rvalue References Explained

C++ and Beyond 2012: Scott Meyers - Universal References in C++11

like image 650
Vitality Avatar asked Apr 06 '13 21:04

Vitality


2 Answers

Evaluation of expression template typically happens when you save result to some special type like:

Result D = A*B+sin(C)+3.;

Result type of expression:

A*B+sin(C)+3.

is not Result, but it is something that convertable to Result. And evaluation happens during such conversion.


My question is: is there any technique (for example, using placeholders?) to recognize that the values of D are actually unused

Such kind of "transfromation":

Result D = A*B+sin(C)+3.;
Result F = D*E;

to

Result F = (A*B+sin(C)+3.)*E;

Is possible when you do not evaluate D. To do this, typically you should capture D as it's real , expression type. For instance, with help of auto:

auto &&D = A*B+sin(C)+3.;
Result F = D*E;

However, you should be carefull - sometimes expression template captures references to it's operands, and if you have some rvalue which would expire after it's expression:

auto &&D = A*get_large_rvalue();
// At this point, result of **get_large_rvalue** is destructed
// And D has expiried reference
Result F = D*E;

Where get_large_rvalue is:

LargeMatrix get_large_rvalue();

It's result is rvalue, it expiries at the end of full expression when get_large_rvalue was called. If something within expression would store pointer/reference to it (for later evaluation) and you would "defer" evaluation - pointer/reference will outlive pointed/referenced object.

In order to prevent this, you should do:

auto &&intermediate = get_large_rvalue(); // it would live till the end of scope
auto &&D = A*intermediate ;
Result F = D*E;

I'm not familiar with C++11 but, as I understand, auto asks the compiler to determine the type of a variable from its initialization

Yes, exactly. This is called Type Inference/Deduction.

C++98/03 had type deduction only for template functions, in C++11 there is auto.

Do you know how do CUDA and C++11 interact each other?

I haven't used CUDA (though I used OpenCL), but I guess that there will be no any problems in Host code with C++11. Maybe some C++11 features are not supported within Device code, but for your purpose - you need auto only in Host code

Finally, is there any possibility with only C++?

Do you mean pre-C++11? I.e. C++98/C++03? Yes, it is possible, but it has more syntax noise, maybe that would be reason to reject it:

// somehwhere
{
    use_D(A*B+sin(C)+3.);
}
// ...
template<typename Expression>
void use_D(Expression D) // depending on your expression template library
                         //   it may be better to use (const Expression &e)
{
    Result F = D*E;
}

I'm now using CUDA/Visual Studio 2010 under Windows. Could you please recommend a compiler/toolset/environment for both OS' to use C++11 in the framework of my interest (GPGPU and CUDA, in you know any)

MSVC 2010 does supports some parts of C++11. In particular it supports auto. So, if you need only auto from C++11 - MSVC2010 is OK.

But if you may use MSVC2012 - I would recommed to stick with it - it has much better C++11 support.

Also, the trick auto &&intermediate = get_large_rvalue(); seems to be not "transparent" to a third party user (which is not supposed to know such an issue). Am I right? Any alternative?

If expression template stores references to some values, and you defer it's evaluation. You should be sure that all it's references are alive at the place of evaluation. Use any method which you want - it can be done without auto, like:

LargeMatrix temp = get_large_rvalue();

Or maybe even global/static variable (less prefered method).

A last comment/question: to use auto &&D = A*B+sin(C)+3.; it seems that I should overload the operator= for assignments between two expressions, right?

No, such form does not requires nor copy/move assignment operator nor copy/move constructor.

Basically it just names temporary value, and prolongs it's lifetime to the end of scope. Check this SO.

But, if you would use another form:

auto D = A*B+sin(C)+3.;

In such case copy/move/conversion constructor maybe required in order to compile (though actual copy can be optimized away by compiler by use of Copy Ellision)

Also, switching between using auto (for the intermediate expressions) and Result to force calculation seems to be non-transparent to a third party user. Any alternative?

I am not sure if there is any alternative. This is by nature of expression templates. While you using them in expressions - they return some internal intermediate types, but when you store to some "special" type - evaluation is triggered.

like image 136
Evgeny Panasyuk Avatar answered Sep 28 '22 19:09

Evgeny Panasyuk


In c++11 you can use auto

auto D = A*B+sin(C)+3.;

assuming you are using expression templates, the type of D would be <some template type which represents an expression>. Now, you have to use this carefully, because you are saving up some memory (no need to allocate space for a matrix) but depending on how you use it this may not be the best.

Think about

F = D*E

An element D[i][j] needs to be "visited" many times when computing D*E (actually n times where n is the size of the matrices). If D is a plain-matrix type this is no problem. If D is an expression you are evaluating it many, many times.

On the contray, doing

F = D + E

is fine.

Think about this: you can not write F = E*(A*B+sin(C)+3.); using only two nested loops.

like image 33
sbabbi Avatar answered Sep 28 '22 18:09

sbabbi