I've come across this piece of code in some forum:
if ( a * b * c * d == 0 ) ....
and the owner claims this is a tad faster than
if (a == 0 || b == 0 || c == 0 || d == 0)
These variables are defined as:
int a, b, c, d;
And their absolute values are guaranteed to be less than or equal to 100. (So we could ignore the possibility of overflowing)
If we just ignore the readability
and just focus on the performance, is the claim really correct?
It seems to me that the second approach might actually be faster since you could take advantage of 'short-circuit' sometimes. But then, what-do-I-know?!
The C standard says nothing about performance. The question of whether
if ( a * b * c * d == 0 )
is faster than
if (a == 0 || b == 0 || c == 0 || d == 0)
is meaningful only in the context of a particular compiler generating code running on a particular machine. The only real way to compare them is to measure the performance on your own system, or on whatever system you're interested in.
Still, we can speculate about what the performance is likely to be.
As you said, a
, b
, c
, and d
are objects of type int
. You also said they're in the range [-100,+100] -- but the compiler doesn't necessarily know that.
A compiler is free to replace any expression with code that does the same thing.
Multiplication is a relatively complex operation, and is likely to be slower than, say, addition or comparison. A compiler could recognize that the first condition will be true if any of the four variables has the value 0
, and replace the multiplications with whatever happens to be faster. But each optimization a compiler performs has to be explicitly programmed by the compiler's developers, and this particular pattern isn't likely to be common enough for it to be worth the effort of recognizing it.
You say the values are small enough that overflow isn't an issue. In fact, you can't portably make that assumption; INT_MAX
can be as small as 32767
. But the compiler knows how big an int
is on the system for which it's generating code. Still, unless it has information about the values of a
, b
, c
, and d
, it can't assume that there will be no overflow.
Except that yes, actually, it can make that assumption. The behavior of signed integer overflow is undefined. That gives an optimizing compiler permission to assume that overflow can't occur (if it does, whatever behavior the program exhibits is valid anyway).
So yes, a compiler could replace the multiplications with something simpler, but it's not likely to do so.
As for the other expression, a == 0 || b == 0 || c == 0 || d == 0
, the ||
operator has short-circuit semantics; if the left operand is true (non-zero), then the right operand isn't evaluated. And that kind of conditional code can create performance issues due to CPU pipeline issues. Since none of the subexpressions have side effects (assuming none of the variables are declared volatile
), the compiler can evaluate all four subexpressions, perhaps in parallel, if that's faster.
A quick experiment shows that gcc -O3
for x86 doesn't perform either optimization. For the first expression, it generates code that performs three multiplications. For the second, it generates conditional branches, implementing the canonical short-circuit evaluations (I don't know whether avoiding that would be faster or not).
Your best bet is to write reasonable code that's as straightforward as possible, both because it makes your source code easier to read and maintain, and because it's likely to give the compiler a better chance to recognize patterns and perform optimizations. If you try to do fancy micro-optimizations in your source code, you're as likely to hinder the compiler's optimizations as you are to help.
Don't worry too much about how fast your code is unless you've measured it and found it to be too slow. If you need your code to be faster, first concentrate on improved algorithms and data structures. And only if that fails, consider source-level micro-optimizations.
The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.
-- Michael A. Jackson
The two are not equivalent. For example on my machine (32-bit x86 MSVC) if a, b, c and d are all equal to 0x100
then the first test will pass but the second condition will not.
Also note that multiplication is a costly operation, so the first version won't necessarily be faster.
EDIT: Code generated for the first version:
00401000 8B 44 24 04 mov eax,dword ptr [esp+4]
00401004 0F AF 44 24 08 imul eax,dword ptr [esp+8]
00401009 0F AF 44 24 0C imul eax,dword ptr [esp+0Ch]
0040100E 0F AF 44 24 10 imul eax,dword ptr [esp+10h]
00401013 85 C0 test eax,eax
00401015 75 07 jne f1+1Eh (40101Eh)
00401017 ...
Code generated for the second version:
00401020 83 7C 24 04 00 cmp dword ptr [esp+4],0
00401025 74 15 je f2+1Ch (40103Ch)
00401027 83 7C 24 08 00 cmp dword ptr [esp+8],0
0040102C 74 0E je f2+1Ch (40103Ch)
0040102E 83 7C 24 0C 00 cmp dword ptr [esp+0Ch],0
00401033 74 07 je f2+1Ch (40103Ch)
00401035 83 7C 24 10 00 cmp dword ptr [esp+10h],0
0040103A 75 07 jne f2+23h (401043h)
0040103C ...
Benchmarks on my machine (in nanoseconds): the first version runs in about 1.83 ns and the second in about 1.39 ns. The values of a, b, c and d didn't change during each run, so apparently the branch predictor could predict 100% of the branches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With