Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dictionary enum key performance

I have a concern about generic dictionaries using enums for keys.

As stated at the below page, using enums for keys will allocate memory: http://blogs.msdn.com/b/shawnhar/archive/2007/07/02/twin-paths-to-garbage-collector-nirvana.aspx

I've tested and confirmed the behavior, and it's causing problems in my project. For readability, I believe using enums for keys is very useful, and the optimal solution for me would be to write write a class implementing IDictionary<TKey, TValue>, which would use integers for keys internally. The reason is I don't want to change all my existing dictionaries to use integers for keys, and do implicit casting. This would be best performance wise, but it will give me lot of work initially and it will reduce the readability.

So I've tried a couple of approaches, including using GetHashCode (which unfortunately allocates memory) to build an internal Dictionary<int, TValue>.

So, to wrap it up in one question; can anyone think of a solution that I can use to keep the readability of Dictionary<SomeEnum, TValue>, while having the perfomance of a Dictionary<int, TValue>?

Any advice much appreciated.

like image 821
Magnus Andersson Avatar asked Oct 09 '14 14:10

Magnus Andersson


3 Answers

The problem is boxing. It's an act of turning value type into object, which might, or might not be unnecessary.

The way Dictionarycompares keys, is essentially, that it will use EqualComparer<T>.Default, and call GetHashCode() to find correct bucket, and Equals to compare if there's any value in the bucket that is equal tot he one we're looking for.

The good thing is this: .NET framework has good optimizations, which avoid boxing in the case of "Enum integers". See CreateComparer(). It's highly unlikely that you will see any difference here, between integers and enums, as keys.

To note here: this is not an easy act, in fact, if you dig in deep, you'll come to conclusion that quarter of this battle is implemented through CLR "hacks". As seen here:

   static internal int UnsafeEnumCast<T>(T val) where T : struct    
    {
        // should be return (int) val; but C# does not allow, runtime 
        // does this magically
        // See getILIntrinsicImplementation for how this happens.  
        throw new InvalidOperationException();
    }

It could be definitely easier if generics had Enum constraint, and perhaps even something a long of the lines UnsafeEnumCast<T>(T val) where T : Enum->Integer, but well... they don't.

You might be wondering, what exactly is going on in getILIntrinsicImplementation for that EnumCast? I wonder too. Not exactly sure as of this right moment how to check it. It's replaced on run-time with specific IL code I believe?!

MONO

Now, answer to your question: yes you're right. Enum as a key on Mono, will be slower in a tight loop. It's because Mono does boxing on Enums, as far I can see. You can check out EnumIntEqualityComparer, as you can see, it calls Array.UnsafeMov that basically casts a type of T into integer, through boxing: (int)(object) instance;. That's the "classical" limitation of generics, and there is no nice solution for this problem.

Solution 1

Implement an EqualityComparer<MyEnum> for your concrete Enum. This will avoid all the casting.

public struct MyEnumCOmparer : IEqualityComparer<MyEnum>
{
    public bool Equals(MyEnum x, MyEnum y)
    {
        return x == y;
    }

    public int GetHashCode(MyEnum obj)
    {
        // you need to do some thinking here,
        return (int)obj;
    }
}

All you need to do then, is pass it to your Dictionary:

new Dictionary<MyEnum, int>(new MyEnumComparer());

It works, it gives you the same performance as it is with integers, and avoids boxing issues. The problem is though, this is not generic and writing this for each Enum can feel stupid.

Solution 2

Writing a generic Enum comparer, and using few tricks that avoids unboxing. I wrote this with a little help from here,

// todo; check if your TEnum is enum && typeCode == TypeCode.Int
struct FastEnumIntEqualityComparer<TEnum> : IEqualityComparer<TEnum> 
    where TEnum : struct
{
    static class BoxAvoidance
    {
        static readonly Func<TEnum, int> _wrapper;

        public static int ToInt(TEnum enu)
        {
            return _wrapper(enu);
        }

        static BoxAvoidance()
        {
            var p = Expression.Parameter(typeof(TEnum), null);
            var c = Expression.ConvertChecked(p, typeof(int));

            _wrapper = Expression.Lambda<Func<TEnum, int>>(c, p).Compile();
        }
    }

    public bool Equals(TEnum firstEnum, TEnum secondEnum)
    {
        return BoxAvoidance.ToInt(firstEnum) == 
            BoxAvoidance.ToInt(secondEnum);
    }

    public int GetHashCode(TEnum firstEnum)
    {
        return BoxAvoidance.ToInt(firstEnum);
    }
}

Solution 3

Now, there's a little problem with the solution#2, as Expression.Compile() is not that famous on iOS(no runtime code generation), and some mono versions don't have ?? Expression.Compile ?? (not sure).

You can write simple IL code that will take care of the enum conversion, and compile it.

.assembly extern mscorlib
{
  .ver 0:0:0:0
}
.assembly 'enum2int'
{
  .hash algorithm 0x00008004
  .ver  0:0:0:0
}

.class public auto ansi beforefieldinit EnumInt32ToInt
    extends [mscorlib]System.Object
{
    .method public hidebysig static int32  Convert<valuetype 
        .ctor ([mscorlib]System.ValueType) TEnum>(!!TEnum 'value') cil managed
    {
      .maxstack  8
      IL_0000:  ldarg.0
      IL_000b:  ret
    }
} 

In order to compile it into an assembly, you have to call:

ilasm enum2int.il /dll where enum2int.il is the text file containing IL.

You can now reference the given assembly(enum2int.dll) and call the static method, as such:

struct FastEnumIntEqualityComparer<TEnum> : IEqualityComparer<TEnum> 
    where TEnum : struct
{
    int ToInt(TEnum en)
    {
        return EnumInt32ToInt.Convert(en);
    }

    public bool Equals(TEnum firstEnum, TEnum secondEnum)
    {
        return ToInt(firstEnum) == ToInt(secondEnum);
    }

    public int GetHashCode(TEnum firstEnum)
    {
        return ToInt(firstEnum);
    }
}

It might seem to be killer code, but it avoids boxing, and it should give you better berformance on Mono.

like image 95
Erti-Chris Eelmaa Avatar answered Oct 13 '22 22:10

Erti-Chris Eelmaa


I ran into this same problem a while back and ended up incorporating it into a library I wrote of generic enum extension and helper methods (it's written in C++/CLI (compiled AnyCPU) because C# doesn't allow creation of type constraints for enum types). It's available under the Apache 2.0 license on NuGet and GitHub

You can implement it in a Dictionary by grabbing the IEqualityComparer from the static Enums type in the library:

var equalityComparer = Enums.EqualityComparer<MyEnum>();
var dictionary = new Dictionary<MyEnum, MyValueType>(equalityComparer);

The values are handled without boxing, using a technique similar to the UnsafeEnumCast mentioned in one of the answers already provided (covered to death in tests since it is unsafe). As a result, it's very fast (since that would be the only point of replacing an equality comparer in this case). A benchmarking app is included as well as recent results generated from my build PC.

like image 1
mdip Avatar answered Oct 13 '22 21:10

mdip


Enums as dictionary keys now have the same or better performance as int dictionary keys. I measured this using NUnit:

public class EnumSpeedTest
{
    const int Iterations = 10_000_000;

    [Test]
    public void WasteTimeInt()
    {
        Dictionary<int, int> dict = new Dictionary<int, int>();
        for (int i = 0; i < Iterations; i++)
            dict[i] = i;
        long sum = 0;
        for (int i = 0; i < Iterations; i++)
            sum += dict[i];
        Console.WriteLine(sum);
    }

    enum Enum { Zero = 0, One = 1, Two = 2, Three = 3 }

    [Test]
    public void WasteTimeEnum()
    {
        Dictionary<Enum, int> dict = new Dictionary<Enum, int>();
        for (int i = 0; i < Iterations; i++)
            dict[(Enum)i] = i;
        long sum = 0;
        for (int i = 0; i < Iterations; i++)
            sum += dict[(Enum)i];
        Console.WriteLine(sum);
    }
}

The time taken by these two tests on my Ryzen 5 PC in a .NET 5.0 Release build is consistently around 300ms, and the enum version is slightly faster on most runs.

like image 1
Qwertie Avatar answered Oct 13 '22 20:10

Qwertie