Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternative to increasingly slow Dictionary.Add(Key,Value)?

Tags:

c#

dictionary

By "increasingly" what I mean is that Add is fast at the beginning when there is a low number of keys. After inserting 20% of the keys, it gets very slow. After 50% it gets unbearably slow.

I get that the lower the number of keys, the faster the "key collision search" when adding new elements to the dictionary. But is there any possible way to skip this downside while keeping the Dictionary? I know beforehand that keys don't collide so no check is needed, but I don't know if there is any way to successfully use this info in the code.

BTW I am forced to use the dictionary structure because of architecture restrictions (this structure is swallowed later by a db exporter).


What my code does:

var keyList = GetKeyList();
var resultDict = new Dictionary<T,T>();
foreach (var key in keyList)
{
    resultDict.Add(key,someResult);
}

Edit: since people is asking how the hash code is generated, I will try to clarify this.

Theoretically I have no control over the hash code generation, because unfortunately it uses a convention between multiple systems that are connected through the same db.

In practice, the piece of code that generates the hash code is indeed my code (disclaimer: it wasn't me choosing the convention that is used in the generation).

The key generation is way more complicated than that, but it all boils down to this:

private List<ResultKey> GetKeyList(string prefix, List<float> xCoordList, List<float> yCoordList)
{
    var keyList = new List<ResultKey>();
    var constantSensorName = "xxx";
    foreach (float xCoord in xCoordList)
    {
        foreach (float yCoord in yCoordList)
        {
            string stationName = string.Format("{0}_E{1}N{2}", prefix, xCoord, yCoord);
            keyList.Add(new ResultKey(constantSensorName, stationName));
        }
    }
    return keyList;
}

public struct ResultKey
{
    public string SensorName { get; set; }
    public string StationName { get; set; }

    public ResultKey(string sensorName, string stationName)
    {
        this.SensorName = sensorName;
        this.StationName = stationName;
    }
}
like image 622
Xavier Peña Avatar asked Aug 04 '15 13:08

Xavier Peña


People also ask

Which is faster Hashtable or dictionary?

Dictionary is a generic type and returns an error if you try to find a key which is not there. The Dictionary collection is faster than Hashtable because there is no boxing and unboxing.

Which is faster dictionary or list for look up?

A dictionary is 6.6 times faster than a list when we lookup in 100 items.

What can I use instead of dictionary in C#?

A HashSet, similar to a Dictionary, is a hash-based collection, so look ups are very fast with O(1). But unlike a dictionary, it doesn't store key/value pairs; it only stores values. So, every objects should be unique and this is determined by the value returned from the GetHashCode method.

Are dictionaries faster than arrays?

It depends on the way you are going to get elements from the array. If you are going to get elements by positions (index) in the array then array will be quicker (or at least not slower than dictionary). If you are going to search for elements in the array than dictionary will be faster.


2 Answers

The first thing that comes to mind is to create your own hashing function. The Add method for the dictionary is going to call the default implementation of the getHashCode() method when it goes to add it to the structure. If you put a wrapper class around your keys and overwrote the getHashCode() method, then you could write your own hashing function which, presumably, could implement a less collision prone hash function.

like image 52
Simo9000 Avatar answered Oct 24 '22 09:10

Simo9000


You are using the default hash code generation for your struct ResultKey. The default hash code generation for structs is disappointingly bad. You can't rely on that here because your struct contains two strings which trigger a bad case (see the linked answer). Essentially, only your SensorName field makes it into the hash code, nothing else. That causes all keys with the same SensorName to collide.

Write your own function. I quickly generated one using Resharper:

public struct ResultKey : IEquatable<ResultKey>
{
    public string SensorName { get; set; }
    public string StationName { get; set; }

    public ResultKey(string sensorName, string stationName)
    {
        this.SensorName = sensorName;
        this.StationName = stationName;
    }

    public bool Equals(ResultKey other)
    {
        return string.Equals(SensorName, other.SensorName) && string.Equals(StationName, other.StationName);
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        return obj is ResultKey && Equals((ResultKey)obj);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            return ((SensorName != null ? SensorName.GetHashCode() : 0)*397) ^ (StationName != null ? StationName.GetHashCode() : 0);
        }
    }

    public static bool operator ==(ResultKey left, ResultKey right)
    {
        return left.Equals(right);
    }

    public static bool operator !=(ResultKey left, ResultKey right)
    {
        return !left.Equals(right);
    }
}
like image 37
usr Avatar answered Oct 24 '22 11:10

usr