Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is the CLR faster than me when calling Windows API

I tested different ways of generating a timestamp when I found something surprising (to me).

Calling Windows's GetSystemTimeAsFileTime using P/Invoke is about 3x slower than calling DateTime.UtcNow that internally uses the CLR's wrapper for the same GetSystemTimeAsFileTime.

How can that be?

Here's DateTime.UtcNow's implementation:

public static DateTime UtcNow {
    get {
        long ticks = 0;
        ticks = GetSystemTimeAsFileTime();
        return new DateTime( ((UInt64)(ticks + FileTimeOffset)) | KindUtc);
    }
}

[MethodImplAttribute(MethodImplOptions.InternalCall)] // Implemented by the CLR
internal static extern long GetSystemTimeAsFileTime();

Core CLR's wrapper for GetSystemTimeAsFileTime:

FCIMPL0(INT64, SystemNative::__GetSystemTimeAsFileTime)
{
    FCALL_CONTRACT;

    INT64 timestamp;

    ::GetSystemTimeAsFileTime((FILETIME*)&timestamp);

#if BIGENDIAN
    timestamp = (INT64)(((UINT64)timestamp >> 32) | ((UINT64)timestamp << 32));
#endif

    return timestamp;
}
FCIMPLEND;

My test code utilizing BenchmarkDotNet:

public class Program
{
    static void Main() => BenchmarkRunner.Run<Program>();

    [Benchmark]
    public DateTime UtcNow() => DateTime.UtcNow;

    [Benchmark]
    public long GetSystemTimeAsFileTime()
    {
        long fileTime;
        GetSystemTimeAsFileTime(out fileTime);
        return fileTime;
    }

    [DllImport("kernel32.dll")]
    public static extern void GetSystemTimeAsFileTime(out long systemTimeAsFileTime);
}

And the results:

                  Method |     Median |    StdDev |
------------------------ |----------- |---------- |
 GetSystemTimeAsFileTime | 14.9161 ns | 1.0890 ns |
                  UtcNow |  4.9967 ns | 0.2788 ns |
like image 684
i3arnon Avatar asked Jun 18 '16 15:06

i3arnon


2 Answers

When managed code invokes unmanaged code there's a stack walk making sure the calling code has the UnmanagedCode permission enabling doing that.

That stack walk is done at run-time and has substantial costs in performance.

It's possible to remove the run-time check (there's still a JIT compile-time one) by using the SuppressUnmanagedCodeSecurity attribute:

[SuppressUnmanagedCodeSecurity]
[DllImport("kernel32.dll")]
public static extern void GetSystemTimeAsFileTime(out long systemTimeAsFileTime);

This brings my implementation about half the way towards the CLR's:

                  Method |    Median |    StdDev |
------------------------ |---------- |---------- |
 GetSystemTimeAsFileTime | 9.0569 ns | 0.7950 ns |
                  UtcNow | 5.0191 ns | 0.2682 ns |

Keep in mind though that doing that may be extremely risky security-wise.

Also using unsafe as Ben Voigt suggested brings it halfway again:

                  Method |    Median |    StdDev |
------------------------ |---------- |---------- |
 GetSystemTimeAsFileTime | 6.9114 ns | 0.5432 ns |
                  UtcNow | 5.0226 ns | 0.0906 ns |
like image 146
i3arnon Avatar answered Nov 03 '22 13:11

i3arnon


The CLR almost certainly passes a pointer to a local (automatic, stack) variable to receive the result. The stack doesn't get compacted or relocated, so there's no need to pin memory, etc, and when using a native compiler, such things aren't supported anyway so there's no overhead to account for them.

In C# though, the p/invoke declaration is compatible with passing a member of a managed class instance living in the garbage-collected heap. P/invoke has to pin that instance or else risk having the output buffer move during/before the OS function writes to it. Even though you do pass a variable stored on the stack, p/invoke still must test and see whether the pointer is into the garbage collected heap before it can branch around the pinning code, so there's non-zero overhead even for the identical case.

It's possible that you could get better results using

[DllImport("kernel32.dll")]
public unsafe static extern void GetSystemTimeAsFileTime(long* pSystemTimeAsFileTime);

By eliminating the out parameter, p/invoke no longer has to deal with aliasing and heap compaction, that's now completely the responsibility of your code that sets up the pointer.

like image 35
Ben Voigt Avatar answered Nov 03 '22 13:11

Ben Voigt