Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Double.IsNaN test 100 times faster?

I found this in the .NET Source Code: It claims to be 100 times faster than System.Double.IsNaN. Is there a reason to not use this function instead of System.Double.IsNaN?

[StructLayout(LayoutKind.Explicit)]
private struct NanUnion
{
    [FieldOffset(0)] internal double DoubleValue;
    [FieldOffset(0)] internal UInt64 UintValue;
}

// The standard CLR double.IsNaN() function is approximately 100 times slower than our own wrapper,
// so please make sure to use DoubleUtil.IsNaN() in performance sensitive code.
// PS item that tracks the CLR improvement is DevDiv Schedule : 26916.
// IEEE 754 : If the argument is any value in the range 0x7ff0000000000001L through 0x7fffffffffffffffL 
// or in the range 0xfff0000000000001L through 0xffffffffffffffffL, the result will be NaN.         
public static bool IsNaN(double value)
{
    NanUnion t = new NanUnion();
    t.DoubleValue = value;

    UInt64 exp = t.UintValue & 0xfff0000000000000;
    UInt64 man = t.UintValue & 0x000fffffffffffff;

    return (exp == 0x7ff0000000000000 || exp == 0xfff0000000000000) && (man != 0);
}

EDIT: Still according to the .NET Source Code, the code for System.Double.IsNaN is the following:

public unsafe static bool IsNaN(double d)
{
    return (*(UInt64*)(&d) & 0x7FFFFFFFFFFFFFFFL) > 0x7FF0000000000000L;
}
like image 418
Goswin Rothenthal Avatar asked Jun 20 '14 16:06

Goswin Rothenthal


3 Answers

It claims to be 100 times faster than System.Double.IsNaN

Yes, that used to be true. You are missing the time-machine to know when this decision was made. Double.IsNaN() didn't used to look like that. From the SSCLI10 source code:

   public static bool IsNaN(double d)
   {
       // Comparisions of a NaN with another number is always false and hence both conditions will be false.
       if (d < 0d || d >= 0d) {
          return false;
       }
       return true;
   }

Which performs very poorly on the FPU in 32-bit code if d is NaN. Just an aspect of the chip design, it is treated as exceptional in the micro-code. The Intel processor manuals say very little about it, other than documenting a processor perf counter that tracks the number of "Floating Point assists" and noting that the micro-code sequencer comes into play for denormals and NaNs, "potentially costing hundreds of cycles". Not otherwise an issue in 64-bit code, it uses SSE2 instructions which don't have this perf hit.

Some code to play with to see this yourself:

using System;
using System.Diagnostics;

class Program {
    static void Main(string[] args) {
        double d = double.NaN;
        for (int test = 0; test < 10; ++test) {
            var sw1 = Stopwatch.StartNew();
            bool result1 = false;
            for (int ix = 0; ix < 1000 * 1000; ++ix) {
                result1 |= double.IsNaN(d);
            }
            sw1.Stop();
            var sw2 = Stopwatch.StartNew();
            bool result2 = false;
            for (int ix = 0; ix < 1000 * 1000; ++ix) {
                result2 |= IsNaN(d);
            }
            sw2.Stop();
            Console.WriteLine("{0} - {1} x {2}%", sw1.Elapsed, sw2.Elapsed, 100 * sw2.ElapsedTicks / sw1.ElapsedTicks, result1, result2);

        }
        Console.ReadLine();
    }
    public static bool IsNaN(double d) {
        // Comparisions of a NaN with another number is always false and hence both conditions will be false.
        if (d < 0d || d >= 0d) {
            return false;
        }
        return true;
    }
}

Which uses the version of Double.IsNaN() that got micro-optimized. Such micro-optimizations are not evil in a framework btw, the great burden of the Microsoft .NET programmers is that they can rarely guess when their code is in the critical path of an application.

Results on my machine when targeting 32-bit code (Haswell mobile core):

00:00:00.0027095 - 00:00:00.2427242 x 8957%
00:00:00.0025248 - 00:00:00.2191291 x 8678%
00:00:00.0024344 - 00:00:00.2209950 x 9077%
00:00:00.0024144 - 00:00:00.2321169 x 9613%
00:00:00.0024126 - 00:00:00.2173313 x 9008%
00:00:00.0025488 - 00:00:00.2237517 x 8778%
00:00:00.0026940 - 00:00:00.2231146 x 8281%
00:00:00.0025052 - 00:00:00.2145660 x 8564%
00:00:00.0025533 - 00:00:00.2200943 x 8619%
00:00:00.0024406 - 00:00:00.2135839 x 8751%
like image 171
Hans Passant Avatar answered Nov 06 '22 11:11

Hans Passant


Here's a naive benchmark:

public static void Main()
{
    int iterations = 500 * 1000 * 1000;

    double nan = double.NaN;
    double notNan = 42;

    Stopwatch sw = Stopwatch.StartNew();

    bool isNan;
    for (int i = 0; i < iterations; i++)
    {
        isNan = IsNaN(nan);     // true
        isNan = IsNaN(notNan);  // false
    }

    sw.Stop();
    Console.WriteLine("IsNaN: {0}", sw.ElapsedMilliseconds);

    sw = Stopwatch.StartNew();

    for (int i = 0; i < iterations; i++)
    {
        isNan = double.IsNaN(nan);     // true
        isNan = double.IsNaN(notNan);  // false
    }

    sw.Stop();
    Console.WriteLine("double.IsNaN: {0}", sw.ElapsedMilliseconds);

    Console.Read();
}

Obviously they're wrong:

IsNaN: 15012

double.IsNaN: 6243


EDIT + NOTE: I'm sure the timing will change depending on input values, many other factors etc., but claiming that generally speaking this wrapper is 100x faster than the default implementation seems just wrong.

like image 12
ken2k Avatar answered Nov 06 '22 10:11

ken2k


I call shenanigans. The "fast" version has a considerably larger number of ops and even performs more reads from memory, (stack, so in L1 but still slower than registers).

00007FFAC53D3D01  movups      xmmword ptr [rsp+8],xmm0  
00007FFAC53D3D06  sub         rsp,48h  
00007FFAC53D3D0A  mov         qword ptr [rsp+20h],0  
00007FFAC53D3D13  mov         qword ptr [rsp+28h],0  
00007FFAC53D3D1C  mov         qword ptr [rsp+30h],0  
00007FFAC53D3D25  mov         rax,7FFAC5423D40h  
00007FFAC53D3D2F  mov         eax,dword ptr [rax]  
00007FFAC53D3D31  test        eax,eax  
00007FFAC53D3D33  je          00007FFAC53D3D3A  
00007FFAC53D3D35  call        00007FFB24EE39F0  
00007FFAC53D3D3A  mov         r8d,8  
00007FFAC53D3D40  xor         edx,edx  
00007FFAC53D3D42  lea         rcx,[rsp+20h]  
00007FFAC53D3D47  call        00007FFB24A21680  
            t.DoubleValue = value;
00007FFAC53D3D4C  movsd       xmm5,mmword ptr [rsp+50h]  
00007FFAC53D3D52  movsd       mmword ptr [rsp+20h],xmm5  

            UInt64 exp = t.UintValue & 0xfff0000000000000;
00007FFAC53D3D58  mov         rax,qword ptr [rsp+20h]  
00007FFAC53D3D5D  mov         rcx,0FFF0000000000000h  
00007FFAC53D3D67  and         rax,rcx  
00007FFAC53D3D6A  mov         qword ptr [rsp+28h],rax  
            UInt64 man = t.UintValue & 0x000fffffffffffff;
00007FFAC53D3D6F  mov         rax,qword ptr [rsp+20h]  
00007FFAC53D3D74  mov         rcx,0FFFFFFFFFFFFFh  
00007FFAC53D3D7E  and         rax,rcx  
00007FFAC53D3D81  mov         qword ptr [rsp+30h],rax  

            return (exp == 0x7ff0000000000000 || exp == 0xfff0000000000000) && (man != 0);
00007FFAC53D3D86  mov         rax,7FF0000000000000h  
00007FFAC53D3D90  cmp         qword ptr [rsp+28h],rax  
00007FFAC53D3D95  je          00007FFAC53D3DA8  
00007FFAC53D3D97  mov         rax,0FFF0000000000000h  
00007FFAC53D3DA1  cmp         qword ptr [rsp+28h],rax  
00007FFAC53D3DA6  jne         00007FFAC53D3DBD  
00007FFAC53D3DA8  xor         eax,eax  
00007FFAC53D3DAA  cmp         qword ptr [rsp+30h],0  
00007FFAC53D3DB0  setne       al  
00007FFAC53D3DB3  mov         dword ptr [rsp+38h],eax  
00007FFAC53D3DB7  mov         al,byte ptr [rsp+38h]  
00007FFAC53D3DBB  jmp         00007FFAC53D3DC1  
00007FFAC53D3DBD  xor         eax,eax  
00007FFAC53D3DBF  jmp         00007FFAC53D3DC1  
00007FFAC53D3DC1  nop  
00007FFAC53D3DC2  add         rsp,48h  
00007FFAC53D3DC6  ret  

Versus the .NET version:

            return (*(UInt64*)(&d) & 0x7FFFFFFFFFFFFFFFL) > 0x7FF0000000000000L;
00007FFAC53D3DE0  movsd       mmword ptr [rsp+8],xmm0  
00007FFAC53D3DE6  sub         rsp,38h  
00007FFAC53D3DEA  mov         rax,7FFAC5423D40h  
00007FFAC53D3DF4  mov         eax,dword ptr [rax]  
00007FFAC53D3DF6  test        eax,eax  
00007FFAC53D3DF8  je          00007FFAC53D3DFF  
00007FFAC53D3DFA  call        00007FFB24EE39F0  
00007FFAC53D3DFF  mov         rdx,qword ptr [rsp+40h]  
00007FFAC53D3E04  mov         rax,7FFFFFFFFFFFFFFFh  
00007FFAC53D3E0E  and         rdx,rax  
00007FFAC53D3E11  xor         ecx,ecx  
00007FFAC53D3E13  mov         rax,7FF0000000000000h  
00007FFAC53D3E1D  cmp         rdx,rax  
00007FFAC53D3E20  seta        cl  
00007FFAC53D3E23  mov         dword ptr [rsp+20h],ecx  
00007FFAC53D3E27  movzx       eax,byte ptr [rsp+20h]  
00007FFAC53D3E2C  jmp         00007FFAC53D3E2E  
00007FFAC53D3E2E  nop  
00007FFAC53D3E2F  add         rsp,38h  
00007FFAC53D3E33  ret  
like image 7
Cory Nelson Avatar answered Nov 06 '22 09:11

Cory Nelson