Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my dictionary performing poorly with composite keys in C#?

I have a method that uses recursion to traverse a tree and update the items.

Currently the method takes pretty long to process all the items, so i started optimizing things. Among those things is the use of a dictionary instead of executing a DB query for each item.

The dictionary is defined as

System.Collections.Generic.Dictionary<EffectivePermissionKey, MyData>

The key type is defined as

private struct EffectivePermissionKey
{
  // http://blog.martindoms.com/2011/01/03/c-tip-override-equals-on-value-types-for-better-performance/
  public override bool Equals(object aObject)
  {
    if (aObject == null)
      return false;
    else
      return aObject is EffectivePermissionKey && Equals((EffectivePermissionKey)aObject);
  }

  public bool Equals(EffectivePermissionKey aObject)
  {
    return this.ID == aObject.ID && this.OrchardUserID == aObject.OrchardUserID;
  }

  public override int GetHashCode()
  { 
    // http://stackoverflow.com/a/32502294/3936440
    return unchecked(ID.GetHashCode() * 23 * 23 + OrchardUserID.GetHashCode() * 23);
  }

  public int ID;
  public int OrchardUserID;
}

When the method runs it takes around 5000 recursions to update all items.

Initially it took around 100 seconds without the dictionary.

A first approach with DB queries replaced by the use of a dictionary with int keys took 22 seconds.

Now, with DB queries replaced by the use of the dictionary defined above and proper TryGetValue() calls it takes 97 seconds <- WAT.

What is going on here? What could cause this massive performance drop?

Edit

At first, it seemed like a hash collision issue to me, so i added a breakpoint in EffectivePermissionKey.Equals() to verify that this method is called but it's not called, therefore no hash collision i guess.

Edit2

Now i'm confused. I thought Equals() only gets called when the hash code does not match. After printing out the hash codes of my keys and the keys used in TryGetValue() i see that these codes match. Then i looked at the source code of Dictionary<> and there's a line in FindEntry() that looks like this:

if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i;

This means that for each item key in the dictionary the GetHashCode() and Equals() gets called because i process all items in the dictionary as the items are the result of the DB query whereas these results where processed before the dictionary approach anyway.

like image 691
ViRuSTriNiTy Avatar asked Feb 05 '16 10:02

ViRuSTriNiTy


1 Answers

Nevermind folks, sorry for taking your time, my approach was completely wrong. Let me tell why.

The issue broken down for simplicity:

A -> recursion 1, DB query for permission of node A with ID = 1
  B -> recursion 2, DB query for permission of node B with ID = 2
  C -> recursion 3, DB query for permission of node C with ID = 3
    D -> recursion 4, DB query for permission of node D with ID = 4

As you can see, one DB query per tree node.

Now the flawed approach for optimizing this:

Dictionary<int, PermissionData> myMap

...

DB query of all permissions and insert into myMap

...

A -> recursion 1, myMap.TryGetValue(1, out ...)
  B -> recursion 2, myMap.TryGetValue(2, out ...)
  C -> recursion 3, myMap.TryGetValue(3, out ...)
    D -> recursion 4, myMap.TryGetValue(4, out ...)

You see now that the query is done once but on each node aTryGetValue() call is made.

In my specific case this is actually slower as executing single queries because

  • the dictionary contains as much keys as nodes exist because each node has a DB permission entry

and

  • each TryGetValue() requires / results in

    1. creating a key instance (with ID and user ID)
    2. calling TryGetValue()
    3. calculating the hash of the key instance
    4. calling Equals()

These 4 steps are executed around 5000 times versus executing 5000 simple entity framework queries (SELECT * FROM table WHERE ID = ...). I don't know why, but the queries are faster here, perhaps the compiler optimizes something away.

Anyway, i reworked the whole thing and now i have an outer loop over User IDs and an inner recursive traversal witch uses dictionaries with a simple int key (node ID). It gives me lighting fast results. The whole execution now takes around 16 seconds and with a few more tweaks and threading i got it down to under 1 second. Mission accomplished.

Edit

After discussing this issue with a colleague we came to the conclusion that the performance issue is most likely caused by the prime numbers used in the hash code calculation. I used 23 x 23 x 23 but it should be something like 17 x 23 x 23 to avoid collisions but i cannot test this as the concerning code / application is not in my responsibility anymore. Btw a generic solution can be found here: https://stackoverflow.com/a/763966/3936440

Edit 2

As noted by a colleague the following answer suggests to not use 17 and 23, instead use larger prime numbers, see https://stackoverflow.com/a/38281271/3936440

like image 105
ViRuSTriNiTy Avatar answered Oct 12 '22 23:10

ViRuSTriNiTy