As part of a compiler project I have to write GNU assembler code for x86 to compare floating point values. I have tried to find resources on how to do this online and from what I understand it works like this: Assuming the two values I want to compare are the only values on the floating point stack, then the <code>fcomi</code> instruction will compare the values and set the CPU-flags so that the <code>je</code>, <code>jne</code>, <code>jl</code>, ... instructions can be used. I'm asking because this only works sometimes. For example: <pre class="prettyprint"><code>.section .data msg: .ascii "Hallo\n\0" f1: .float 10.0 f2: .float 9.0 .globl main .type main, @function main: flds f1 flds f2 fcomi jg leb pushl $msg call printf addl $4, %esp leb: pushl $0 call exit </code></pre> will not print "Hallo" even though I think it should, and if you switch f1 and f2 it still won't which is a logical contradiction. <code>je</code> and <code>jne</code> however seem to work fine. What am I doing wrong? PS: does the fcomip pop only one value or does it pop both?

TL:DR: Use above / below conditions (like for unsigned integer) to test the result of compares. For various historical reasons (mapping from FP status word to FLAGS via <code>fcom</code> / <code>fstsw</code> / <code>sahf</code> which <code>fcomi</code> (new in PPro) matches), FP compares set CF, not OF / SF. See also http://www.ray.masmcode.com/tutorial/fpuchap7.htm Modern SSE/SSE2 scalar compares into FLAGS follow this as well, with [u]<code>comiss</code> / <code>sd</code>. (Unlike SIMD compares, which have a predicate as part of the instruction, as an immediate, since they only produce a single all-zeros / all-ones result for each element, not a set of FLAGS.) <hr> This is all coming from Volume 2 of Intel 64 and IA-32 Architectures Software Developer's Manuals. <code>FCOMI</code> sets only some of the flags that <code>CMP</code> does. Your code has <code>%st(0) == 9</code> and <code>%st(1) == 10</code>. (Since it's a stack they're loaded onto), referring to the table on page 3-348 in Volume 2A you can see that this is the case "ST0 < ST(i)", so it will clear ZF and PF and set CF. Meanwhile on pg. 3-544 Vol. 2A you can read that <code>JG</code> means "Jump short if greater (ZF=0 and SF=OF)". In other words it's testing the sign, overflow and zero flags, but <code>FCOMI</code> doesn't set sign or overflow! Depending on which conditions you wish to jump, you should look at the possible comparison results and decide when you want to jump. <pre class="prettyprint"> +--------------------+---+---+---+ | Comparison results | Z | P | C | +--------------------+---+---+---+ | ST0 > ST(i) | 0 | 0 | 0 | | ST0 < ST(i) | 0 | 0 | 1 | | ST0 = ST(i) | 1 | 0 | 0 | | unordered | 1 | 1 | 1 | one or both operands were NaN. +--------------------+---+---+---+ </pre> I've made this small table to make it easier to figure out: <pre class="prettyprint"> +--------------+---+---+-----+------------------------------------+ | Test | Z | C | Jcc | Notes | +--------------+---+---+-----+------------------------------------+ | ST0 < ST(i) | X | 1 | JB | ZF will never be set when CF = 1 | | ST0 <= ST(i) | 1 | 1 | JBE | Either ZF or CF is ok | | ST0 == ST(i) | 1 | X | JE | CF will never be set in this case | | ST0 != ST(i) | 0 | X | JNE | | | ST0 >= ST(i) | X | 0 | JAE | As long as CF is clear we are good | | ST0 > ST(i) | 0 | 0 | JA | Both CF and ZF must be clear | +--------------+---+---+-----+------------------------------------+ Legend: X: don't care, 0: clear, 1: set </pre> In other words the condition codes match those for using unsigned comparisons. The same goes if you're using <code>FMOVcc</code>. If either (or both) operand to <code>fcomi</code> is NaN, it sets <code>ZF=1 PF=1 CF=1</code>. (FP compares have 4 possible results: <code>></code>, <code><</code>, <code>==</code>, or unordered). If you care what your code does with NaNs, you may need an extra <code>jp</code> or <code>jnp</code>. But not always: for example, <code>ja</code> is only true if CF=0 and ZF=0, so it will be not-taken in the unordered case. If you want the unordered case to take the same execution path as below or equal, then <code>ja</code> is all you need. <hr> Here you should use <code>JA</code> if you want it to print (ie. <code>if (!(f2 > f1)) { puts("hello"); }</code>) and <code>JBE</code> if you don't (corresponds to <code>if (!(f2 <= f1)) { puts("hello"); }</code>). (Note this might be a little confusing due to the fact that we only print if we don't jump). <hr> Regarding your second question: by default <code>fcomi</code> doesn't pop anything. You want its close cousin <code>fcomip</code> which pops <code>%st0</code>. You should always clear the fpu register stack after usage, so all in all your program ends up like this assuming you want the message printed: <pre class="prettyprint"><code>.section .rodata msg: .ascii "Hallo\n\0" f1: .float 10.0 f2: .float 9.0 .globl main .type main, @function main: flds f1 flds f2 fcomip fstp %st(0) # to clear stack ja leb # won't jump, jbe will pushl $msg call printf addl $4, %esp leb: pushl $0 call exit </code></pre>

x86 assembler: floating point compare

Tags:

floating-point

compare

x86

assembly

gnu-assembler

As part of a compiler project I have to write GNU assembler code for x86 to compare floating point values. I have tried to find resources on how to do this online and from what I understand it works like this:

Assuming the two values I want to compare are the only values on the floating point stack, then the fcomi instruction will compare the values and set the CPU-flags so that the je, jne, jl, ... instructions can be used.

I'm asking because this only works sometimes. For example:

.section    .data
msg:    .ascii "Hallo\n\0"
f1:     .float 10.0
f2:     .float 9.0

.globl main
    .type   main, @function
main:
    flds f1
    flds f2
    fcomi
    jg leb
    pushl $msg
    call printf
    addl $4, %esp
leb:
    pushl $0
    call exit

will not print "Hallo" even though I think it should, and if you switch f1 and f2 it still won't which is a logical contradiction. je and jne however seem to work fine.

What am I doing wrong?

PS: does the fcomip pop only one value or does it pop both?

978

asked Aug 14 '11 14:08

JustMaximumPower

1 Answers

TL:DR: Use above / below conditions (like for unsigned integer) to test the result of compares.

For various historical reasons (mapping from FP status word to FLAGS via fcom / fstsw / sahf which fcomi (new in PPro) matches), FP compares set CF, not OF / SF. See also http://www.ray.masmcode.com/tutorial/fpuchap7.htm

Modern SSE/SSE2 scalar compares into FLAGS follow this as well, with [u]comiss / sd. (Unlike SIMD compares, which have a predicate as part of the instruction, as an immediate, since they only produce a single all-zeros / all-ones result for each element, not a set of FLAGS.)

This is all coming from Volume 2 of Intel 64 and IA-32 Architectures Software Developer's Manuals.

FCOMI sets only some of the flags that CMP does. Your code has %st(0) == 9 and %st(1) == 10. (Since it's a stack they're loaded onto), referring to the table on page 3-348 in Volume 2A you can see that this is the case "ST0 < ST(i)", so it will clear ZF and PF and set CF. Meanwhile on pg. 3-544 Vol. 2A you can read that JG means "Jump short if greater (ZF=0 and SF=OF)". In other words it's testing the sign, overflow and zero flags, but FCOMI doesn't set sign or overflow!

Depending on which conditions you wish to jump, you should look at the possible comparison results and decide when you want to jump.

+--------------------+---+---+---+
| Comparison results | Z | P | C |
+--------------------+---+---+---+
| ST0 > ST(i)        | 0 | 0 | 0 |
| ST0 < ST(i)        | 0 | 0 | 1 |
| ST0 = ST(i)        | 1 | 0 | 0 |
| unordered          | 1 | 1 | 1 |  one or both operands were NaN.
+--------------------+---+---+---+

I've made this small table to make it easier to figure out:

+--------------+---+---+-----+------------------------------------+
| Test         | Z | C | Jcc | Notes                              |
+--------------+---+---+-----+------------------------------------+
| ST0 < ST(i)  | X | 1 | JB  | ZF will never be set when CF = 1   |
| ST0 <= ST(i) | 1 | 1 | JBE | Either ZF or CF is ok              |
| ST0 == ST(i) | 1 | X | JE  | CF will never be set in this case  |
| ST0 != ST(i) | 0 | X | JNE |                                    |
| ST0 >= ST(i) | X | 0 | JAE | As long as CF is clear we are good |
| ST0 > ST(i)  | 0 | 0 | JA  | Both CF and ZF must be clear       |
+--------------+---+---+-----+------------------------------------+
Legend: X: don't care, 0: clear, 1: set

In other words the condition codes match those for using unsigned comparisons. The same goes if you're using FMOVcc.

If either (or both) operand to fcomi is NaN, it sets ZF=1 PF=1 CF=1. (FP compares have 4 possible results: >, <, ==, or unordered). If you care what your code does with NaNs, you may need an extra jp or jnp. But not always: for example, ja is only true if CF=0 and ZF=0, so it will be not-taken in the unordered case. If you want the unordered case to take the same execution path as below or equal, then ja is all you need.

Here you should use JA if you want it to print (ie. if (!(f2 > f1)) { puts("hello"); }) and JBE if you don't (corresponds to if (!(f2 <= f1)) { puts("hello"); }). (Note this might be a little confusing due to the fact that we only print if we don't jump).

Regarding your second question: by default fcomi doesn't pop anything. You want its close cousin fcomip which pops %st0. You should always clear the fpu register stack after usage, so all in all your program ends up like this assuming you want the message printed:

.section    .rodata
msg:    .ascii "Hallo\n\0"
f1:     .float 10.0
f2:     .float 9.0 

.globl main
    .type   main, @function
main:
    flds   f1
    flds   f2
    fcomip
    fstp   %st(0) # to clear stack
    ja     leb # won't jump, jbe will
    pushl  $msg
    call   printf
    addl   $4, %esp
leb:
    pushl  $0
    call   exit

answered Sep 17 '22 23:09

user786653

Related questions
                            
                                Why GCC does not use LOAD(without fence) and STORE+SFENCE for Sequential Consistency?
                            
                                Size of store buffers on Intel hardware? What exactly is a store buffer?
                            
                                What is the best way to go about writing a simple x86 assembler?
                            
                                Push XMM register to the stack
                            
                                How to flush the CPU cache for a region of address space in Linux?
                            
                                Assembling 32-bit binaries on a 64-bit system (GNU toolchain)
                            
                                How does $ work in NASM, exactly?
                            
                                Why is mov turing complete?
                            
                                x86 CMP Instruction Difference
                            
                                How to force GDB to disassemble code when it says "No function contains program counter for selected frame"?
                            
                                x86 spinlock using cmpxchg
                            
                                why .net assemblies differ for different architectures?
                            
                                What are my available march/mtune options?
                            
                                Is NOT missing from SSE, AVX?
                            
                                CPU serial number
                            
                                Counting machine instructions using gdb
                            
                                What are some tips for optimizing the assembly code generated by a compiler?
                            
                                How to make a loop in x86 assembly language?
                            
                                How to find the horizontal maximum in a 256-bit AVX vector
                            
                                How do objects work in x86 at the assembly level?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With