Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Arithmetic vs boolean operations

I've come across this piece of code in some forum:

if ( a * b * c * d == 0 ) ....

and the owner claims this is a tad faster than

if (a == 0 || b == 0 || c == 0 || d == 0)

These variables are defined as:

int a, b, c, d;

And their absolute values are guaranteed to be less than or equal to 100. (So we could ignore the possibility of overflowing)

If we just ignore the readability and just focus on the performance, is the claim really correct?

It seems to me that the second approach might actually be faster since you could take advantage of 'short-circuit' sometimes. But then, what-do-I-know?!

like image 415
user1508893 Avatar asked Aug 04 '12 20:08

user1508893


2 Answers

The C standard says nothing about performance. The question of whether

if ( a * b * c * d == 0 )

is faster than

if (a == 0 || b == 0 || c == 0 || d == 0)

is meaningful only in the context of a particular compiler generating code running on a particular machine. The only real way to compare them is to measure the performance on your own system, or on whatever system you're interested in.

Still, we can speculate about what the performance is likely to be.

As you said, a, b, c, and d are objects of type int. You also said they're in the range [-100,+100] -- but the compiler doesn't necessarily know that.

A compiler is free to replace any expression with code that does the same thing.

Multiplication is a relatively complex operation, and is likely to be slower than, say, addition or comparison. A compiler could recognize that the first condition will be true if any of the four variables has the value 0, and replace the multiplications with whatever happens to be faster. But each optimization a compiler performs has to be explicitly programmed by the compiler's developers, and this particular pattern isn't likely to be common enough for it to be worth the effort of recognizing it.

You say the values are small enough that overflow isn't an issue. In fact, you can't portably make that assumption; INT_MAX can be as small as 32767. But the compiler knows how big an int is on the system for which it's generating code. Still, unless it has information about the values of a, b, c, and d, it can't assume that there will be no overflow.

Except that yes, actually, it can make that assumption. The behavior of signed integer overflow is undefined. That gives an optimizing compiler permission to assume that overflow can't occur (if it does, whatever behavior the program exhibits is valid anyway).

So yes, a compiler could replace the multiplications with something simpler, but it's not likely to do so.

As for the other expression, a == 0 || b == 0 || c == 0 || d == 0, the || operator has short-circuit semantics; if the left operand is true (non-zero), then the right operand isn't evaluated. And that kind of conditional code can create performance issues due to CPU pipeline issues. Since none of the subexpressions have side effects (assuming none of the variables are declared volatile), the compiler can evaluate all four subexpressions, perhaps in parallel, if that's faster.

A quick experiment shows that gcc -O3 for x86 doesn't perform either optimization. For the first expression, it generates code that performs three multiplications. For the second, it generates conditional branches, implementing the canonical short-circuit evaluations (I don't know whether avoiding that would be faster or not).

Your best bet is to write reasonable code that's as straightforward as possible, both because it makes your source code easier to read and maintain, and because it's likely to give the compiler a better chance to recognize patterns and perform optimizations. If you try to do fancy micro-optimizations in your source code, you're as likely to hinder the compiler's optimizations as you are to help.

Don't worry too much about how fast your code is unless you've measured it and found it to be too slow. If you need your code to be faster, first concentrate on improved algorithms and data structures. And only if that fails, consider source-level micro-optimizations.

The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.

-- Michael A. Jackson

like image 57
Keith Thompson Avatar answered Sep 28 '22 03:09

Keith Thompson


The two are not equivalent. For example on my machine (32-bit x86 MSVC) if a, b, c and d are all equal to 0x100 then the first test will pass but the second condition will not.

Also note that multiplication is a costly operation, so the first version won't necessarily be faster.

EDIT: Code generated for the first version:

00401000 8B 44 24 04      mov         eax,dword ptr [esp+4] 
00401004 0F AF 44 24 08   imul        eax,dword ptr [esp+8] 
00401009 0F AF 44 24 0C   imul        eax,dword ptr [esp+0Ch] 
0040100E 0F AF 44 24 10   imul        eax,dword ptr [esp+10h] 
00401013 85 C0            test        eax,eax 
00401015 75 07            jne         f1+1Eh (40101Eh) 
00401017 ...

Code generated for the second version:

00401020 83 7C 24 04 00   cmp         dword ptr [esp+4],0 
00401025 74 15            je          f2+1Ch (40103Ch) 
00401027 83 7C 24 08 00   cmp         dword ptr [esp+8],0 
0040102C 74 0E            je          f2+1Ch (40103Ch) 
0040102E 83 7C 24 0C 00   cmp         dword ptr [esp+0Ch],0 
00401033 74 07            je          f2+1Ch (40103Ch) 
00401035 83 7C 24 10 00   cmp         dword ptr [esp+10h],0 
0040103A 75 07            jne         f2+23h (401043h) 
0040103C ...

Benchmarks on my machine (in nanoseconds): the first version runs in about 1.83 ns and the second in about 1.39 ns. The values of a, b, c and d didn't change during each run, so apparently the branch predictor could predict 100% of the branches.

like image 20
Yakov Galka Avatar answered Sep 28 '22 05:09

Yakov Galka