Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a floating point to an integer with truncation instead of rounding using the x87 FPU

Tags:

assembly

nasm

x87

The FISTP instruction changes 0.75 to 1 (because of rounding)

I want 0.75 to turn into 0, not 1.

Is there an alternative to FIST/FISTP that truncates instead of rounds?

like image 785
Zach Johnson Avatar asked Dec 13 '16 00:12

Zach Johnson


2 Answers

You truly have a plethora of options here:

  1. If you're using SSE2 instructions anyway, then you can use the SSE2 instructions for converting a floating-point value to an integer value with truncation. Peter Cordes's answer discusses this approach. CVTTSD2SI is the scalar version, and CVTTPD2DQ is the packed/vector version.

    If you're targeting x86-64, SSE2 will always be available, and this is what you should be using for all floating-point operations. The x87 FPU is completely obsolete on x86-64.

    If you're targeting x86-32 processors prior to the Pentium 4 or Athlon 64, then SSE2 instructions will not be available. In that case, SSE instructions may still be available (SSE is supported by Pentium 3, Athlon XP, and later). SSE supports only single-precision floating-point operations, so if you don't need the precision, you can use CVTTSS2SI (scalar) or CVTTPS2DQ (packed/vector). Unfortunately, you often need the precision; see below for a better workaround.

  2. If SSE3 instructions are available (Pentium 4 Prescott, certain Athlon 64s, and later), then you can use the FISTTP instruction, which is like FISTP, except that it always truncates, regardless of the current rounding mode. This is the solution that fuz's answer presents.

    This is a very good solution if you are already using the x87 FPU, but is of limited applicability because if you're targeting chips that support SSE3, they necessarily support SSE2, and therefore you should be using SSE instructions to do all floating-point manipulation. The only exception is if you really need the extended 80-bit precision offered by the x87 FPU for your intermediate calculations (SSE2 is limited to 64-bit double-precision).

  3. If you are stuck on legacy x86-32 processors and using the x87 FPU without SSE, you're still not out of options. There are a couple of fast bit-twiddling methods. These were not my original innovations—the code is scattered around the Internet various places, I just collated and tweaked them slightly, so I cannot take full credit nor can I cite a particular source. Here is one such source.

    For single-precision floating-point values, the entire bit representation fits into a 32-bit register, so the implementation is straightforward (this assumes that the floating-point value to be truncated is at the top of the x87 FPU stack):

    ; Retrieve the bit representation of the original floating-point value.
    push  eax
    fst   DWORD PTR [esp]
    mov   eax, DWORD PTR [esp]
    
    ; Twiddle those raw bits.
    and   eax, 080000000H
    xor   eax, 0BEFFFFFFH
    
    ; Store those manipulated bits back in memory, since we can't load        
    ; directly from a register to the x87 FPU stack.
    mov   DWORD PTR [esp], eax
    
    ; Add the modified value to the original value at the top of the stack.
    fadd  DWORD PTR [esp]
    
    ; Round the adjusted floating-point value to an integer.
    ; (Our bit manipulation ensures that this will always truncate,
    ; regardless of the current rounding mode.)
    fistp DWORD PTR [esp]
    
    ; ... do something with the result in ESP
    
    pop   eax
    

    An alternative implementation uses a static array of "adjustment" values, which we index into based on the "signedness" of the original floating-point value. This is basically what a naïve "truncate" function written in C would do, except that this does it branchlessly:

    const uint32_t kSingleAdjustments[2] = { 0xBEFFFFFF,  /* -0.49999997f */
                                             0x3EFFFFFF   /* +0.49999997f */ };
    
    ; Retrieve the bit representation of the floating-point value.
    push  eax
    fst   DWORD PTR [esp]
    mov   eax, DWORD PTR [esp]
    
    ; Isolate the sign bit.
    shr   eax, 31
    
    ; Use the sign bit as an index into the array of values to add the appropriate
    ; adjustment value to the original floating-point value at the top of the stack.
    ; (NOTE: This syntax is for MSVC's inline asm; translate as necessary.)
    fadd  DWORD PTR [kSingleAdjustments + (eax * TYPE kSingleAdjustments)]
    
    ; Round the adjusted floating-point value to an integer.
    ; (Our adjustment ensures that it will be truncated, regardless of rounding mode.)
    fistp DWORD PTR [esp]
    
    ; ... do something with the result in ESP
    
    pop   eax
    

    My benchmarks suggest that the second variant is faster on Intel processors, but slower on AMD (specifically, Athlon XP and Athlon 64). I ultimately settled on approach #2 for my library, especially since I re-use the "adjustment" values to implement other types of fast rounding.

    Note that the final FISTP instruction supports both m32 and m64 operands, so if you want to truncate to a 64-bit integer for greater precision, that is possible. Just remember to allocate twice as much space on the stack, and then use fistp QWORD PTR, [esp] instead of fistp DWORD PTR, [esp].

    I realize that this all looks very complicated, but this really is significantly faster than adjusting the rounding mode, doing the rounding, and setting the rounding mode back. I have benchmarked it extensively on a variety of processors, and in a variety of code paths, and never found it to be slower. But I use it in C code, where the compiler is required by the standard to emit code that restores the rounding mode. If you're writing assembly by hand, and you need truncation, just switch the FPU's rounding mode to "truncate" once and leave it at that.


    There is a double-precision version of this bit-twiddling code, too. The key is realizing that the sign bit lies in the upper 32 bits of a 64-bit double, so you still only need a single 32-bit register.

    However, the double-precision version is not bug-free! A floating-point value that is extremely close to a whole number will be rounded up to the nearest whole number, instead of being truncated (e.g., 4.99999977 is erroneously rounded to 5, instead of being truncated to 4). Someone smarter than me and with more time to play around with this may come up with a way to fix this, but I'm satisfied with the accuracy of this in most cases, especially given the massive speed improvements.

    const uint64_t kDoubleAdjustments[2] = { 0xBFDFFFFF00000000,
                                             0x3FDFFFFF00000000 };
    
    sub   esp, 8
    fst   QWORD PTR [esp]
    mov   eax, DWORD PTR [esp+4]   ; we only need the upper 32 bits
    
    shr   eax, 31
    fadd  QWORD PTR [kDoubleAdjustments + (eax * TYPE kDoubleAdjustments)]
    
    fistp DWORD PTR [esp]
    
    ; ... do something with the result in ESP
    
    add   esp, 8
    
like image 152
Cody Gray Avatar answered Oct 04 '22 22:10

Cody Gray


The SSE3 instruction set also introduced the fisttp instruction. It works like the fistp instruction, which can store a floating-point number as a 32-bit integer (popping the stack in the process), except that it always truncates the value, regardless of the current rounding mode.

Here is an example of how to use that:

FLD    QWORD PTR [esi] ; load 64 bit floating point number
FISTTP DWORD PTR [edi] ; truncate and store as 32 bit integer

or in AT&T-syntax:

fldl    (%esi)
fisttpl (%edi)

If you do not have a processor that supports SSE3, you can reach similar results with the fistp instruction after making sure the rounding mode is set to “truncate.”

sub    esp,0x4               ; make space for the control word
fstcw  WORD PTR [esp]        ; store the FPU control word
fstcw  WORD PTR [esp+0x2]    ; store another copy
or     WORD PTR [esp],0x0c00 ; set rounding mode to "truncate"
fldcw  WORD PTR [esp]        ; load updated control word
fld    QWORD PTR [esi]       ; load floating point number
fistp  WORD PTR [edi]        ; truncate to integer
fldcw  WORD PTR [esp+0x2]    ; restore control word

or in AT&T-syntax:

sub $4,%esp
fstcw (%esp)
fstcw 2(%esp)
orw $0x0c00,(%esp)
fldcw (%esp)
fldl (%esi) 
fistp (%edi)
fldcw 2(%esp)

If your code is not going to run on an 80286 or older, you might want to use fnstcw instead of fstcw to save one byte per instruction at the expense of the code possibly not working on a real 8087.

like image 43
fuz Avatar answered Oct 04 '22 21:10

fuz