Is <code>if (a < 901)</code> faster than <code>if (a <= 900)</code>? Not exactly as in this simple example, but there are slight performance changes on loop complex code. I suppose this has to do something with generated machine code in case it's even true.

No, it will not be faster on most architectures. You didn't specify, but on x86, all of the integral comparisons will be typically implemented in two machine instructions: <ul> <li>A <code>test</code> or <code>cmp</code> instruction, which sets <code>EFLAGS</code> </li> <li>And a <code>Jcc</code> (jump) instruction, depending on the comparison type (and code layout):</li> <li> <code>jne</code> - Jump if not equal --> <code>ZF = 0</code> </li> <li> <code>jz</code> - Jump if zero (equal) --> <code>ZF = 1</code> </li> <li> <code>jg</code> - Jump if greater --> <code>ZF = 0 and SF = OF</code> </li> <li>(etc...)</li> </ul> <hr> Example (Edited for brevity) Compiled with <code>$ gcc -m32 -S -masm=intel test.c</code> <pre class="prettyprint lang-c prettyprint-override"><code> if (a < b) { // Do something 1 } </code></pre> Compiles to: <pre class="prettyprint"><code> mov eax, DWORD PTR [esp+24] ; a cmp eax, DWORD PTR [esp+28] ; b jge .L2 ; jump if a is >= b ; Do something 1 .L2: </code></pre> And <pre class="prettyprint lang-c prettyprint-override"><code> if (a <= b) { // Do something 2 } </code></pre> Compiles to: <pre class="prettyprint"><code> mov eax, DWORD PTR [esp+24] ; a cmp eax, DWORD PTR [esp+28] ; b jg .L5 ; jump if a is > b ; Do something 2 .L5: </code></pre> So the only difference between the two is a <code>jg</code> versus a <code>jge</code> instruction. The two will take the same amount of time. <hr> I'd like to address the comment that nothing indicates that the different jump instructions take the same amount of time. This one is a little tricky to answer, but here's what I can give: In the Intel Instruction Set Reference, they are all grouped together under one common instruction, <code>Jcc</code> (Jump if condition is met). The same grouping is made together under the Optimization Reference Manual, in Appendix C. Latency and Throughput. <blockquote> Latency — The number of clock cycles that are required for the execution core to complete the execution of all of the μops that form an instruction. </blockquote> <blockquote> Throughput — The number of clock cycles required to wait before the issue ports are free to accept the same instruction again. For many instructions, the throughput of an instruction can be significantly less than its latency </blockquote> The values for <code>Jcc</code> are: <pre class="prettyprint lang-none prettyprint-override"><code> Latency Throughput Jcc N/A 0.5 </code></pre> with the following footnote on <code>Jcc</code>: <blockquote> <ol start="7"> <li>Selection of conditional jump instructions should be based on the recommendation of section Section 3.4.1, “Branch Prediction Optimization,” to improve the predictability of branches. When branches are predicted successfully, the latency of <code>jcc</code> is effectively zero.</li> </ol> </blockquote> So, nothing in the Intel docs ever treats one <code>Jcc</code> instruction any differently from the others. If one thinks about the actual circuitry used to implement the instructions, one can assume that there would be simple AND/OR gates on the different bits in <code>EFLAGS</code>, to determine whether the conditions are met. There is then, no reason that an instruction testing two bits should take any more or less time than one testing only one (Ignoring gate propagation delay, which is much less than the clock period.) <hr> Edit: Floating Point This holds true for x87 floating point as well: (Pretty much same code as above, but with <code>double</code> instead of <code>int</code>.) <pre class="prettyprint"><code> fld QWORD PTR [esp+32] fld QWORD PTR [esp+40] fucomip st, st(1) ; Compare ST(0) and ST(1), and set CF, PF, ZF in EFLAGS fstp st(0) seta al ; Set al if above (CF=0 and ZF=0). test al, al je .L2 ; Do something 1 .L2: fld QWORD PTR [esp+32] fld QWORD PTR [esp+40] fucomip st, st(1) ; (same thing as above) fstp st(0) setae al ; Set al if above or equal (CF=0). test al, al je .L5 ; Do something 2 .L5: leave ret </code></pre>

Historically (we're talking the 1980s and early 1990s), there were some architectures in which this was true. The root issue is that integer comparison is inherently implemented via integer subtractions. This gives rise to the following cases. <pre class="prettyprint"><code>Comparison Subtraction ---------- ----------- A A - B < 0 A = B --> A - B = 0 A > B --> A - B > 0 </code></pre> Now, when <code>A < B</code> the subtraction has to borrow a high-bit for the subtraction to be correct, just like you carry and borrow when adding and subtracting by hand. This "borrowed" bit was usually referred to as the carry bit and would be testable by a branch instruction. A second bit called the zero bit would be set if the subtraction were identically zero which implied equality. There were usually at least two conditional branch instructions, one to branch on the carry bit and one on the zero bit. Now, to get at the heart of the matter, let's expand the previous table to include the carry and zero bit results. <pre class="prettyprint"><code>Comparison Subtraction Carry Bit Zero Bit ---------- ----------- --------- -------- A A - B < 0 0 0 A = B --> A - B = 0 1 1 A > B --> A - B > 0 1 0 </code></pre> So, implementing a branch for <code>A < B</code> can be done in one instruction, because the carry bit is clear only in this case, , that is, <pre class="prettyprint"><code>;; Implementation of "if (A < B) goto address;" cmp A, B ;; compare A to B bcz address ;; Branch if Carry is Zero to the new address </code></pre> But, if we want to do a less-than-or-equal comparison, we need to do an additional check of the zero flag to catch the case of equality. <pre class="prettyprint"><code>;; Implementation of "if (A <= B) goto address;" cmp A, B ;; compare A to B bcz address ;; branch if A </pre> So, on some machines, using a "less than" comparison might save one machine instruction. This was relevant in the era of sub-megahertz processor speed and 1:1 CPU-to-memory speed ratios, but it is almost totally irrelevant today.

Assuming we're talking about internal integer types, there's no possible way one could be faster than the other. They're obviously semantically identical. They both ask the compiler to do precisely the same thing. Only a horribly broken compiler would generate inferior code for one of these. If there was some platform where <code><</code> was faster than <code><=</code> for simple integer types, the compiler should always convert <code><=</code> to <code><</code> for constants. Any compiler that didn't would just be a bad compiler (for that platform).

Is < faster than <=?

Q: Which one is faster == or ===?

So === faster than == in Javascript === compares if the values and the types are the same. == compares if the values are the same, but it also does type conversions in the comparison. Those type conversions make == slower than ===.

Q: Is there a faster language than C?

Judging the performance of programming languages, usually C is called the leader, though Fortran is often faster. New programming languages commonly use C as their reference and they are really proud to be only so much slower than C.

3 Answers

No, it will not be faster on most architectures. You didn't specify, but on x86, all of the integral comparisons will be typically implemented in two machine instructions:

A test or cmp instruction, which sets EFLAGS
And a Jcc (jump) instruction, depending on the comparison type (and code layout):
jne - Jump if not equal --> ZF = 0
jz - Jump if zero (equal) --> ZF = 1
jg - Jump if greater --> ZF = 0 and SF = OF
(etc...)

Example (Edited for brevity) Compiled with $ gcc -m32 -S -masm=intel test.c

    if (a < b) {
        // Do something 1
    }

Compiles to:

    mov     eax, DWORD PTR [esp+24]      ; a
    cmp     eax, DWORD PTR [esp+28]      ; b
    jge     .L2                          ; jump if a is >= b
    ; Do something 1
.L2:

And

    if (a <= b) {
        // Do something 2
    }

Compiles to:

    mov     eax, DWORD PTR [esp+24]      ; a
    cmp     eax, DWORD PTR [esp+28]      ; b
    jg      .L5                          ; jump if a is > b
    ; Do something 2
.L5:

So the only difference between the two is a jg versus a jge instruction. The two will take the same amount of time.

I'd like to address the comment that nothing indicates that the different jump instructions take the same amount of time. This one is a little tricky to answer, but here's what I can give: In the Intel Instruction Set Reference, they are all grouped together under one common instruction, Jcc (Jump if condition is met). The same grouping is made together under the Optimization Reference Manual, in Appendix C. Latency and Throughput.

Latency — The number of clock cycles that are required for the execution core to complete the execution of all of the μops that form an instruction.

Throughput — The number of clock cycles required to wait before the issue ports are free to accept the same instruction again. For many instructions, the throughput of an instruction can be significantly less than its latency

The values for Jcc are:

      Latency   Throughput
Jcc     N/A        0.5

with the following footnote on Jcc:

Selection of conditional jump instructions should be based on the recommendation of section Section 3.4.1, “Branch Prediction Optimization,” to improve the predictability of branches. When branches are predicted successfully, the latency of jcc is effectively zero.

So, nothing in the Intel docs ever treats one Jcc instruction any differently from the others.

If one thinks about the actual circuitry used to implement the instructions, one can assume that there would be simple AND/OR gates on the different bits in EFLAGS, to determine whether the conditions are met. There is then, no reason that an instruction testing two bits should take any more or less time than one testing only one (Ignoring gate propagation delay, which is much less than the clock period.)

Edit: Floating Point

This holds true for x87 floating point as well: (Pretty much same code as above, but with double instead of int.)

        fld     QWORD PTR [esp+32]
        fld     QWORD PTR [esp+40]
        fucomip st, st(1)              ; Compare ST(0) and ST(1), and set CF, PF, ZF in EFLAGS
        fstp    st(0)
        seta    al                     ; Set al if above (CF=0 and ZF=0).
        test    al, al
        je      .L2
        ; Do something 1
.L2:

        fld     QWORD PTR [esp+32]
        fld     QWORD PTR [esp+40]
        fucomip st, st(1)              ; (same thing as above)
        fstp    st(0)
        setae   al                     ; Set al if above or equal (CF=0).
        test    al, al
        je      .L5
        ; Do something 2
.L5:
        leave
        ret

142

answered Oct 13 '22 22:10

Jonathon Reinhart

Historically (we're talking the 1980s and early 1990s), there were some architectures in which this was true. The root issue is that integer comparison is inherently implemented via integer subtractions. This gives rise to the following cases.

Comparison     Subtraction
----------     -----------
A < B      --> A - B < 0
A = B      --> A - B = 0
A > B      --> A - B > 0

Now, when A < B the subtraction has to borrow a high-bit for the subtraction to be correct, just like you carry and borrow when adding and subtracting by hand. This "borrowed" bit was usually referred to as the carry bit and would be testable by a branch instruction. A second bit called the zero bit would be set if the subtraction were identically zero which implied equality.

There were usually at least two conditional branch instructions, one to branch on the carry bit and one on the zero bit.

Now, to get at the heart of the matter, let's expand the previous table to include the carry and zero bit results.

Comparison     Subtraction  Carry Bit  Zero Bit
----------     -----------  ---------  --------
A < B      --> A - B < 0    0          0
A = B      --> A - B = 0    1          1
A > B      --> A - B > 0    1          0

So, implementing a branch for A < B can be done in one instruction, because the carry bit is clear only in this case, , that is,

;; Implementation of "if (A < B) goto address;"
cmp  A, B          ;; compare A to B
bcz  address       ;; Branch if Carry is Zero to the new address

But, if we want to do a less-than-or-equal comparison, we need to do an additional check of the zero flag to catch the case of equality.

;; Implementation of "if (A <= B) goto address;"
cmp A, B           ;; compare A to B
bcz address        ;; branch if A < B
bzs address        ;; also, Branch if the Zero bit is Set

So, on some machines, using a "less than" comparison might save one machine instruction. This was relevant in the era of sub-megahertz processor speed and 1:1 CPU-to-memory speed ratios, but it is almost totally irrelevant today.

618

answered Oct 13 '22 21:10

Lucas

Assuming we're talking about internal integer types, there's no possible way one could be faster than the other. They're obviously semantically identical. They both ask the compiler to do precisely the same thing. Only a horribly broken compiler would generate inferior code for one of these.

If there was some platform where < was faster than <= for simple integer types, the compiler should always convert <= to < for constants. Any compiler that didn't would just be a bad compiler (for that platform).

answered Oct 13 '22 23:10

David Schwartz

Related questions
                            
                                What is the difference between g++ and gcc?
                            
                                Why can't variables be declared in a switch statement?
                            
                                Undefined behavior and sequence points
                            
                                Can I call a constructor from another constructor (do constructor chaining) in C++?
                            
                                Do the parentheses after the type name make a difference with new?
                            
                                What is the difference between 'typedef' and 'using' in C++11?
                            
                                Can a local variable's memory be accessed outside its scope?
                            
                                What are POD types in C++?
                            
                                When should you use a class vs a struct in C++?
                            
                                Difference between private, public, and protected inheritance
                            
                                Where and why do I have to put the "template" and "typename" keywords?
                            
                                Why do we need virtual functions in C++?
                            
                                What are rvalues, lvalues, xvalues, glvalues, and prvalues?
                            
                                Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations with _mm_popcnt_u64 on Intel CPUs
                            
                                Compiling an application for use in highly radioactive environments
                            
                                What is the difference between const int*, const int * const, and int const *?
                            
                                Cycles in family tree software
                            
                                Why does changing 0.1f to 0 slow down performance by 10x?
                            
                                What is a lambda expression in C++11?
                            
                                What is an undefined reference/unresolved external symbol error and how do I fix it?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is < faster than <=?

Tags:

c++

performance

c

assembly

relational-operators

snoopy

People also ask

3 Answers

Jonathon Reinhart

Lucas

David Schwartz

Recent Activity

Donate For Us