I've found that != and == are not the fastest ways for testing for zero or non-zero.
bool nonZero1 = integer != 0; xor eax, eax test ecx, ecx setne al bool nonZero2 = integer < 0 || integer > 0; test ecx, ecx setne al bool zero1 = integer == 0; xor eax, eax test ecx, ecx sete al bool zero2 = !(integer < 0 || integer > 0); test ecx, ecx sete al
Compiler: VC++ 11 Optimization flags: /O2 /GL /LTCG
This is the assembly output for x86-32. The second versions of both comparisons were ~12% faster on both x86-32 and x86-64. However, on x86-64 the instructions were identical (first versions looked exactly like the second versions), but the second versions were still faster.
EDIT: I've added benchmarking code. ZERO: 1544ms, 1358ms NON_ZERO: 1544ms, 1358ms http://pastebin.com/m7ZSUrcP or http://anonymouse.org/cgi-bin/anon-www.cgi/http://pastebin.com/m7ZSUrcP
Note: It's probably inconvenient to locate these functions when compiled in a single source file, because main.asm goes quite big. I had zero1, zero2, nonZero1, nonZero2 in a separate source file.
EDIT2: Could someone with both VC++11 and VC++2010 installed run the benchmarking code and post the timings? It might indeed be a bug in VC++11.
The not-equal-to operator ( != ) returns true if the operands don't have the same value; otherwise, it returns false .
A comparison operator compares two values and returns a boolean value, either True or False . Python has six comparison operators: less than ( < ), less than or equal to ( <= ), greater than ( > ), greater than or equal to ( >= ), equal to ( == ), and not equal to ( != ).
Comparison operators can compare numbers or strings and perform evaluations. Expressions that use comparison operators do not return a number value as do arithmetic expressions. Comparison expressions return either 1 , which represents true, or 0 , which represents false.
This is a great question, but I think you've fallen victim to the compiler's dependency analysis.
The compiler only has to clear the high bits of eax
once, and they remain clear for the second version. The second version would have to pay the price to xor eax, eax
except that the compiler analysis proved it's been left cleared by the first version.
The second version is able to "cheat" by taking advantage of work the compiler did in the first version.
How are you measuring times? Is it "(version one, followed by version two) in a loop", or "(version one in a loop) followed by (version two in a loop)"?
Don't do both tests in the same program (instead recompile for each version), or if you do, test both "version A first" and "version B first" and see if whichever comes first is paying a penalty.
Illustration of the cheating:
timer1.start(); double x1 = 2 * sqrt(n + 37 * y + exp(z)); timer1.stop(); timer2.start(); double x2 = 31 * sqrt(n + 37 * y + exp(z)); timer2.stop();
If timer2
duration is less than timer1
duration, we don't conclude that multiplying by 31 is faster than multiplying by 2. Instead, we realize that the compiler performed common subexpression analysis, and the code became:
timer1.start(); double common = sqrt(n + 37 * y + exp(z)); double x1 = 2 * common; timer1.stop(); timer2.start(); double x2 = 31 * common; timer2.stop();
And the only thing proved is that multiplying by 31 is faster than computing common
. Which is hardly surprising at all -- multiplication is far far faster than sqrt
and exp
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With