Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Helping the compiler to optimize branchy code sequences

I have code sequences in C/C++ that contain lots of branches, something like this:

if( condition1 )
    return true;
if( condition2 )
    return true;
...
return false;

(which is equivalent to return condition1 || condition2 || ...;)

Evaluating each of the conditions requires several memory accesses (all read-only) but the compiler misses an important optimization opportunity by not moving memory accesses before a previous condition is evaluated. The reason being that condition2's memory accesses may segfault when condition1 is true. Well I know that they don't and I would like the compiler to do the sensible thing and mix some of these code sequences where it is appropriate for performance, e.g. to exploit instruction-level-parallelism. I also don't want to change the condition to a logical or (not short-circuit) because one of the branches will likely jump out.

Any ideas on how this can be accomplished (preferably using gcc)?

Thanks.

like image 465
hans Avatar asked Jun 09 '11 08:06

hans


People also ask

How do compilers Optimize code?

Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources or executes faster.

What is code optimization in C?

Code optimization is any method of code modification to improve code quality and efficiency. A program may be optimized so that it becomes a smaller size, consumes less memory, executes more rapidly, or performs fewer input/output operations.


2 Answers

Evaluating each of the conditions requires several memory accesses

Why don't you avoid short-circuit evaluation within individual conditions, but let it happen for the or-ing of conditions?

Using operators that are not short-circuit for built-in types

Exactly how you achieve the former depends on the nature of those conditions (i.e. condition1, condition2 in your code) - given you show nothing about them I can only talk in generalities: where they internally contain short-circuit operators, instead convert the boolean value to an integer representation and use e.g. bitwise-OR (or even '+' or '*' if it reads better and works in your particular usage). Bitwise operators are generally safer as they're lower precedence though - only have to be careful if your conditions already include bitwise operators.

To illustrate:

OLD: return (a > 4 && b == 2 && c < a) ||   // condition1
            (a == 3 && b != 2 && c == -a);  // condition2

NEW: return (a > 4 & b == 2 & c < a) ||
            (a == 3 & b != 2 & c == -a);

Also be careful if you used implicit conversion of numbers/pointers to bool before... you want to normalise them to bools so their least-significant bits reflect their boolean significance:

OLD: return my_int && my_point && !my_double;
NEW: return bool(my_int) & bool(my_point) & !my_double;  // ! normalises before bitwise-&

You might also want to benchmark with...

     bool condition1 = a > 4 & b == 2 & c < a;
     bool condition2 = a == 3 & b != 2 & c == -a;
     return condition1 || condition2;

...which might be faster - possibly only in the overall "return false" case and perhaps when the last conditionN or two are the deciding factor in a "return true".

User-defined operators avoid short-circuit evaluation

Seperately, short-circuit evaluation is disabled for objects with overloaded logical operators, which provides another avenue for you to do your checks using the existing notation, but you'll have to change or enhance your data types.

Thoughts

More generally, you'll only benefit from this if you've a large number of assertions combined in each individual condition - more so if the function tends to run through to return false.

"AProgrammer" makes a great point too - with speculative execution available on modern CPUs, the CPU may already be ahead of the ordering implied by short-circuit evaluation (in some special mode than either avoids or suppresses any memory faults from dereferencing invalid pointers, divides by 0 etc). So, it's possible that the entire attempt at optimisation may prove pointless or even counterproductive. Benchmarking of all alternatives is required.

like image 154
Tony Delroy Avatar answered Oct 29 '22 22:10

Tony Delroy


Could you just move the parts of the condition yourself?

ie

 const bool bCondition1Result = <condition1>;
 const bool bCondition2Result = <condition2>;

and so on

For better optimisation still ... re-jig the order of your conditions so that the most hit one is the first to be checked. This way it will early out more often than not (This may make precious little difference).

like image 36
Goz Avatar answered Oct 29 '22 23:10

Goz