Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

High Runtime for Dictionary.Add for a large amount of items

Tags:

c#

dictionary

I have a C#-Application that stores data from a TextFile in a Dictionary-Object. The amount of data to be stored can be rather large, so it takes a lot of time inserting the entries. With many items in the Dictionary it gets even worse, because of the resizing of internal array, that stores the data for the Dictionary. So I initialized the Dictionary with the amount of items that will be added, but this has no impact on speed.

Here is my function:

private Dictionary<IdPair, Edge> AddEdgesToExistingNodes(HashSet<NodeConnection> connections)
{
  Dictionary<IdPair, Edge> resultSet = new Dictionary<IdPair, Edge>(connections.Count);

  foreach (NodeConnection con in connections)
  {
    ...
    resultSet.Add(nodeIdPair, newEdge);
  }

  return resultSet;
}

In my tests, I insert ~300k items. I checked the running time with ANTS Performance Profiler and found, that the Average time for resultSet.Add(...) doesn't change when I initialize the Dictionary with the needed size. It is the same as when I initialize the Dictionary with new Dictionary(); (about 0.256 ms on average for each Add). This is definitely caused by the amount of data in the Dictionary (ALTHOUGH I initialized it with the desired size). For the first 20k items, the average time for Add is 0.03 ms for each item.

Any idea, how to make the add-operation faster?

Thanks in advance, Frank

Here is my IdPair-Struct:

public struct IdPair
{
  public int id1;
  public int id2;

  public IdPair(int oneId, int anotherId)
  {
    if (oneId > anotherId)
    {
      id1 = anotherId;
      id2 = oneId;
    }
    else if (anotherId > oneId)
    {
      id1 = oneId;
      id2 = anotherId;
    }
    else
      throw new ArgumentException("The two Ids of the IdPair can't have the same value.");
  }
}
like image 953
Aaginor Avatar asked May 05 '10 11:05

Aaginor


2 Answers

Since you have a struct, you get the default implementation of Equals() and GetHashCode(). As others have pointed out, this is not very efficient since it uses reflection, but I don't think the reflection is the issue.

My guess is that your hash codes get distributed unevenly by the default GetHashCode(), which could happen, for example, if the default implementation returns a simple XOR of all members (in which case hash(a, b) == hash(b, a)). I can't find any documentation of how ValueType.GetHashCode() is implemented, but try adding

public override int GetHashCode() {
    return oneId << 16 | (anotherId & 0xffff);
}

which might be better.

like image 159
erikkallen Avatar answered Sep 20 '22 14:09

erikkallen


IdPair is a struct, and you haven't overridden Equals or GetHashCode. This means that the default implementation of those methods will be used.

For value-types the default implementation of Equals and GetHashCode uses reflection, which is likely to result in poor performance. Try providing your own implementation of the methods and see if that helps.

My suggested implementation, it might not be exactly what you need/want:

public struct IdPair : IEquatable<IdPair>
{
    // ...

    public override bool Equals(object obj)
    {
        if (obj is IdPair)
            return Equals((IdPair)obj);

        return false;
    }

    public bool Equals(IdPair other)
    {
        return id1.Equals(other.id1)
            && id2.Equals(other.id2);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            int hash = 269;
            hash = (hash * 19) + id1.GetHashCode();
            hash = (hash * 19) + id2.GetHashCode();
            return hash;
        }
    }
}
like image 30
LukeH Avatar answered Sep 19 '22 14:09

LukeH