Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected performance results when comparing dictionary lookup vs multiple is operators in .NET 4.7

I have the problem where I need to do dynamic dispatch based on an object type. The types based on which I need to dispatch are known at compile time - in my example they are 17.

My initial guess was to use a Dictionary<Type, Action<Object>> for the dispatching and to use obj.GetType() to find out the appropriate action. But then I decided to use BenchmarkDotNet to see if I can do better and exactly how expensive the dispatch lookup would be. Bellow is the code I used for the benchmark.

public class Program
{
    private static readonly Object Value = Guid.NewGuid();
    private static readonly Dictionary<Type, Action<Object>> Dictionary = new Dictionary<Type, Action<Object>>()
    {
        [ typeof( Byte ) ] = x => Empty( (Byte)x ),
        [ typeof( Byte[] ) ] = x => Empty( (Byte[])x ),
        [ typeof( SByte ) ] = x => Empty( (SByte)x ),
        [ typeof( Int16 ) ] = x => Empty( (Int16)x ),
        [ typeof( UInt16 ) ] = x => Empty( (UInt16)x ),
        [ typeof( Int32 ) ] = x => Empty( (Int32)x ),
        [ typeof( UInt32 ) ] = x => Empty( (UInt32)x ),
        [ typeof( Int64 ) ] = x => Empty( (Int64)x ),
        [ typeof( UInt64 ) ] = x => Empty( (UInt64)x ),
        [ typeof( Decimal ) ] = x => Empty( (Decimal)x ),
        [ typeof( Single ) ] = x => Empty( (Single)x ),
        [ typeof( Double ) ] = x => Empty( (Double)x ),
        [ typeof( String ) ] = x => Empty( (String)x ),
        [ typeof( DateTime ) ] = x => Empty( (DateTime)x ),
        [ typeof( TimeSpan ) ] = x => Empty( (TimeSpan)x ),
        [ typeof( Guid ) ] = x => Empty( (Guid)x ),
        [ typeof( Char ) ] = x => Empty( (Char)x ),
    };


    [Benchmark]
    public void Switch() => Switch( Value );


    [Benchmark]
    public void Lookup() => Lookup( Value );


    private static void Switch( Object value )
    {
        if ( value is Byte ) goto L_Byte;
        if ( value is SByte ) goto L_SByte;
        if ( value is Int16 ) goto L_Int16;
        if ( value is UInt16 ) goto L_UInt16;
        if ( value is Int32 ) goto L_Int32;
        if ( value is UInt32 ) goto L_UInt32;
        if ( value is Int64 ) goto L_Int64;
        if ( value is UInt64 ) goto L_UInt64;
        if ( value is Decimal ) goto L_Decimal;
        if ( value is Single ) goto L_Single;
        if ( value is Double ) goto L_Double;
        if ( value is DateTime ) goto L_DateTime;
        if ( value is TimeSpan ) goto L_TimeSpan;
        if ( value is DateTimeOffset ) goto L_DateTimeOffset;
        if ( value is String ) goto L_String;
        if ( value is Byte[] ) goto L_ByteArray;
        if ( value is Char ) goto L_Char;
        if ( value is Guid ) goto L_Guid;

        return;

        L_Byte: Empty( (Byte)value ); return;
        L_SByte: Empty( (SByte)value ); return;
        L_Int16: Empty( (Int16)value ); return;
        L_UInt16: Empty( (UInt16)value ); return;
        L_Int32: Empty( (Int32)value ); return;
        L_UInt32: Empty( (UInt32)value ); return;
        L_Int64: Empty( (Int64)value ); return;
        L_UInt64: Empty( (UInt64)value ); return;
        L_Decimal: Empty( (Decimal)value ); return;
        L_Single: Empty( (Single)value ); return;
        L_Double: Empty( (Double)value ); return;
        L_DateTime: Empty( (DateTime)value ); return;
        L_DateTimeOffset: Empty( (DateTimeOffset)value ); return;
        L_TimeSpan: Empty( (TimeSpan)value ); return;
        L_String: Empty( (String)value ); return;
        L_ByteArray: Empty( (Byte[])value ); return;
        L_Char: Empty( (Char)value ); return;
        L_Guid: Empty( (Guid)value ); return;
    }


    private static void Lookup( Object value )
    {
        if ( Dictionary.TryGetValue( value.GetType(), out var action ) )
        {
            action( value );
        }
    }


    [MethodImpl( MethodImplOptions.NoInlining )]
    private static void Empty<T>( T value ) { }


    static void Main( string[] args )
    {
        BenchmarkRunner.Run( typeof( Program ) );

        Console.ReadLine();
    }
}

In my example I ran the test with a boxed Guid which is the worst case in the handcrafted Switch function. The results were surprising to say the least:

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.125)
    Processor=Intel Core i7-4790K CPU 4.00GHz (Haswell), ProcessorCount=8
    Frequency=3903988 Hz, Resolution=256.1483 ns, Timer=TSC
      [Host]     : .NET Framework 4.7 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2600.0
      DefaultJob : .NET Framework 4.7 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2600.0

     Method |     Mean |     Error |    StdDev |
    ------- |---------:|----------:|----------:|
     Switch | 13.21 ns | 0.1057 ns | 0.0989 ns |
     Lookup | 28.22 ns | 0.1082 ns | 0.1012 ns |

The switch function 2 times faster for it's worst case. If I reorder the ifs so most common types are first then on average I expect it to run 3-5 times faster.

My question is how come 18 checks are so much faster than a single dictionary lookup? Am I missing something obvious?

EDIT:

The original test was x86 (Prefer 32bit) mode on x64 machine. I ran the tests in 64 release build as well:

    Method |      Mean |     Error |    StdDev |
---------- |----------:|----------:|----------:|
    Switch | 12.451 ns | 0.0600 ns | 0.0561 ns |
    Lookup | 22.552 ns | 0.1108 ns | 0.1037 ns |
like image 968
Ivan Zlatanov Avatar asked Dec 28 '17 12:12

Ivan Zlatanov


1 Answers

I'm by no means an IL performance guru, but if you decompile and especially look at the IL, this make sense.

The is operator is only 4 opcodes (ldarg, isinst, ldnull, cgt), and each switch part only 7 in total with the goto added in. The action part of Switch to call Empty() is then another 6, giving 17*7+6 = 125 max.

By contrast Dictionary.TryGetValue may only be one method call, but inside this it is doing a lot of work hashing, looping and comparing values:

http://referencesource.microsoft.com/#mscorlib/system/collections/generic/dictionary.cs,2e5bc6d8c0f21e67

public bool TryGetValue(TKey key, out TValue value) {
    int i = FindEntry(key);
    if (i >= 0) {
        value = entries[i].value;
        return true;
    }
    value = default(TValue);
    return false;
}

http://referencesource.microsoft.com/#mscorlib/system/collections/generic/dictionary.cs,bcd13bb775d408f1

private int FindEntry(TKey key) {
    if( key == null) {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
    }

    if (buckets != null) {
        int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
        for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {
            if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i;
        }
    }
    return -1;
}

The for loop inside FindEntry alone is 31 opcodes for each loop, giving a max of 527 opcodes just for this part.

This is very basic analysis but it is easy to see that the switch should be more performant. As is often the case tho, you need to consider performance versus readability and maintainability. If using a Dictionary lookup gives you nicer code, it is rare that the performance loss (15ns) will outweigh that benefit.

like image 65
Rhumborl Avatar answered Nov 15 '22 08:11

Rhumborl