Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there unsigned equivalents of the x87 FILD and SSE CVTSI2SD instructions?

I want to implement the equivalent of C's uint-to-double cast in the GHC Haskell compiler. We already implement int-to-double using FILD or CVTSI2SD. Is there unsigned versions of these operations or am I supposed to zero out the highest bit of the uint before the conversion (thus losing range)?

like image 579
tibbe Avatar asked Dec 05 '12 23:12

tibbe


4 Answers

As someone said, "Good Artists Copy; Great Artists Steal". So we can just check how other compiler writers solved this issue. I used a simple snippet:

volatile unsigned int x;
int main()
{
  volatile double  y = x;
  return y;
}

(volatiles added to ensure the compiler does not optimize out the conversions)

Results (irrelevant instructions skipped):

Visual C++ 2010 cl /Ox (x86)

  __real@41f0000000000000 DQ 041f0000000000000r ; 4.29497e+009

  mov   eax, DWORD PTR ?x@@3IC          ; x
  fild  DWORD PTR ?x@@3IC           ; x
  test  eax, eax
  jns   SHORT $LN4@main
  fadd  QWORD PTR __real@41f0000000000000
$LN4@main:
  fstp  QWORD PTR _y$[esp+8]

So basically the compiler is adding an adjustment value in case the sign bit was set.

Visual C++ 2010 cl /Ox (x64)

  mov   eax, DWORD PTR ?x@@3IC          ; x
  pxor  xmm0, xmm0
  cvtsi2sd xmm0, rax
  movsdx    QWORD PTR y$[rsp], xmm0

No need to adjust here because the compiler knows that rax will have the sign bit cleared.

Visual C++ 2012 cl /Ox

  __xmm@41f00000000000000000000000000000 DB 00H, 00H, 00H, 00H, 00H, 00H, 00H
  DB 00H, 00H, 00H, 00H, 00H, 00H, 00H, 0f0H, 'A'

  mov   eax, DWORD PTR ?x@@3IC          ; x
  movd  xmm0, eax
  cvtdq2pd xmm0, xmm0
  shr   eax, 31                 ; 0000001fH
  addsd xmm0, QWORD PTR __xmm@41f00000000000000000000000000000[eax*8]
  movsd QWORD PTR _y$[esp+8], xmm0

This uses branchless code to add 0 or the magic adjustment depending on whether the sign bit was cleared or set.

like image 173
Igor Skochinsky Avatar answered Sep 28 '22 10:09

Igor Skochinsky


You can exploit some of the properties of the IEEE double format and interpret the unsigned value as part of the mantissa, while adding some carefully crafted exponent.

Bits 63 62-52     51-0
     S  Exp       Mantissa
     0  1075      20 bits 0, followed by your unsigned int

The 1075 comes from the IEEE exponent bias (1023) for doubles and a "shift" amount of 52 bits for your mantissa. Note that there is a implicit "1" leading the mantissa, which needs to be subtracted later.

So:

double uint32_to_double(uint32_t x) {
    uint64_t xx = x;
    xx += 1075ULL << 52;         // add the exponent
    double d = *(double*)&xx;    // or use a union to convert
    return d - (1ULL << 52);     // 2 ^^ 52
}

If you don't have native 64 bit on you platform a version using SSE for the integer steps might be beneficial, but that depends of course.

On my platform this compiles to

0000000000000000 <uint32_to_double>:
   0:   48 b8 00 00 00 00 00    movabs $0x4330000000000000,%rax
   7:   00 30 43 
   a:   89 ff                   mov    %edi,%edi
   c:   48 01 f8                add    %rdi,%rax
   f:   c4 e1 f9 6e c0          vmovq  %rax,%xmm0
  14:   c5 fb 5c 05 00 00 00    vsubsd 0x0(%rip),%xmm0,%xmm0 
  1b:   00 
  1c:   c3                      retq

which looks pretty good. The 0x0(%rip) is the magic double constant, and if inlined some instructions like the upper 32 bit zeroing and the constant reload will vanish.

like image 24
Gunther Piez Avatar answered Sep 28 '22 10:09

Gunther Piez


There is a better way

__m128d _mm_cvtsu32_sd(__m128i n) {
    const __m128i magic_mask = _mm_set_epi32(0, 0, 0x43300000, 0);
    const __m128d magic_bias = _mm_set_sd(4503599627370496.0);
    return _mm_sub_sd(_mm_castsi128_pd(_mm_or_si128(n, magic_mask)), magic_bias);
}
like image 26
Marat Dukhan Avatar answered Sep 28 '22 09:09

Marat Dukhan


We already implement int-to-double using FILD ...
Is there unsigned versions of these operations

If you want exactly x87 FILD opcode to use, just shift uint64 to uint63 (div 2) and then mul it by 2 back, but already as double, so the x87 uint64-to-double conversion requires one FMUL execution in overhead.

The example: 0xFFFFFFFFFFFFFFFFU -> +1.8446744073709551e+0019

it was unable to post the code example in the strict form rules. I'll try later.

    //inline
    double    u64_to_d(unsigned _int64 v){

    //volatile double   res;
    volatile unsigned int tmp=2;
    _asm{
    fild  dword ptr tmp
    //v>>=1;
    shr   dword ptr v+4, 1
    rcr   dword ptr v, 1
    fild  qword ptr v

    //save lsb
    //mov   byte ptr tmp, 0  
    //rcl   byte ptr tmp, 1

    //res=tmp+res*2;
    fmulp st(1),st
    //fild  dword ptr tmp
    //faddp st(1),st 

    //fstp  qword ptr res
    }

    //return res;
    //fld  qword ptr res
}

VC produced x86 output

        //inline
        double    u64_to_d(unsigned _int64 v){
    55                   push        ebp  
    8B EC                mov         ebp,esp  
    81 EC 04 00 00 00    sub         esp,04h  

        //volatile double   res;
        volatile unsigned int tmp=2;
    C7 45 FC 02 00 00 00 mov         dword ptr [tmp], 2  
        _asm{
        fild  dword ptr tmp
    DB 45 FC             fild        dword ptr [tmp]  
        //v>>=1;
        shr   dword ptr v+4, 1
    D1 6D 0C             shr         dword ptr [ebp+0Ch],1  
        rcr   dword ptr v, 1
    D1 5D 08             rcr         dword ptr [v],1  
        fild  qword ptr v
    DF 6D 08             fild        qword ptr [v]  

        //save lsb
    //    mov   byte ptr [tmp], 0  
    //C6 45 FC 00        mov         byte ptr [tmp], 0
    //    rcl   byte ptr tmp, 1
    //D0 55 FC           rcl         byte ptr [tmp],1  

        //res=tmp+res*2;
        fmulp st(1),st
    DE C9                fmulp       st(1),st  
    //    fild  dword ptr tmp
    //DB 45 FC           fild        dword ptr [tmp]  
    //    faddp st(1),st 
    //DE C1              faddp       st(1),st  


        //fstp  qword ptr res
        //fstp        qword ptr [res]  
    }

        //return res;
        //fld         qword ptr [res]  

    8B E5                mov         esp,ebp  
    5D                   pop         ebp  
    C3                   ret  
}

i posted (probably i manually removed all incorrected ascii chars in text file).

like image 32
grizlyk Avatar answered Sep 28 '22 10:09

grizlyk