Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Two strings for Key of a Dictionary

I have two string, that i d like to use them as the dictionary key but i m kinda lazy to create another object, calculate the hashcode of the strings etc.

So instead of that, can i get the hashcodes of two string , add them and use the result as the key of Dictionary?

It s possible to cause collisions? right?.

Any ideas?

like image 739
DarthVader Avatar asked Feb 20 '11 20:02

DarthVader


2 Answers

I have two string, that i d like to use them as the dictionary key but i m kinda lazy to create another object

In .NET 4.0, you can use the Tuple<T1, T2> class as the key, with T1 and T2 = string.

can i get the hashcodes of two string , add them and use the result as the key of Dictionary?

The formula Tuple<T1, T2> uses for combining hash-codes is something like (not documented or guaranteed not to change): ((h1 << 5) + h1) ^ h2, which should be decent enough for your purposes. By the way, naively adding isn't normally the best way to combine hash-codes.

It s possible to cause collisions? right?.

This is always possible, even with a single string. There are more strings than there are 32-bit integers.

like image 98
Ani Avatar answered Sep 29 '22 12:09

Ani


If you're on .NET 4, you can use the Tuple class:

Dictionary<Tuple<string, string>, TValue> dict = new ...

If you're not on .NET 4, you should create your own type to hold this.

You can use the KeyValuePair struct, but it inherits the relevant methods from the base value type, and thus relies heavily on reflection. This has performance implications (see bottom of answer.)

For KeyValuePair:

Dictionary<KeyValuePair<string, string>, TValue> dict = new ...

Here's a general type you can use if you don't want to cook it up yourself:

public struct SimpleTuple<TValue1, TValue2>
{
    private readonly TValue1 _Value1;
    private readonly TValue2 _Value2;

    public SimpleTuple(TValue1 value1, TValue2 value2)
    {
        _Value1 = value1;
        _Value2 = value2;
    }

    public TValue1 Value1 { get { return _Value1; } }
    public TValue2 Value2 { get { return _Value2; } }

    public int GetHashCode()
    {
        unchecked
        {
            int result = 37;

            result *= 23;
            if Value1 != null)
                result += Value1.GetHashCode();

            result *= 23;
            if (Value2 != null)
                result += Value2.GetHashCode();

            return result;
        }
    }

    public override bool Equals(object obj)
    {
        if (obj == null) return false;
        if (obj.GetType() != typeof(SimpleTuple<TValue1, TValue2>))
            return false;

        var other = (SimpleTuple<TValue1, TValue2>)obj;
        return Equals(other.Value1, Value1) && Equals(other.Value2, Value2);
    }
}

Of course, KeyValuePair also works on .NET 4.0 just as good bad.

As for collisions, it depends on what you mean. A hashtable (a dictionary uses a hashtable structure internally) always have the possibility of getting key collisions, but that's where the comparison comes into play. If two distinct keys generate the same hash code, the dictionary class will compare key against key to see if they're really the same values, or just produce the same hash code.

The reasoning behind why a hashtable will always have the possibility of collisions is best described with the pidgeonhole principle (Wikipedia).

This means that if you two different keys would cause a collision, it wouldn't be a problem, they'll both be stored, with the right values, in the dictionary.

Of course, if you create the same key twice, the dictionary will count that as the same key and either fail to add a new value, or overwrite the existing one (depending on how you ask it to add the value.)

This will throw an exception on duplicate keys:

dict.Add(key, value);

This will add, or overwrite existing:

dict[key] = value;

In response to the comment by Ani, I wrote the following simple test script for LINQPad. The output was:

KeyValuePair: 975ms
MyKeyValuePair: 52ms

script:

void Main()
{
    const int iterations = 10 * 1000 * 1000;

    // JIT preheat
    Test1(1);
    Test2(1);

    Stopwatch sw = Stopwatch.StartNew();
    Test1(iterations);
    sw.Stop();
    Debug.WriteLine("KeyValuePair: " + sw.ElapsedMilliseconds + "ms");

    sw = Stopwatch.StartNew();
    Test2(iterations);
    sw.Stop();
    Debug.WriteLine("MyKeyValuePair: " + sw.ElapsedMilliseconds + "ms");
}

public static void Test1(int iterations)
{
    for (int index = 0; index < iterations; index++)
    {
        var kvp = new KeyValuePair<int, int>(index, index);
        kvp.GetHashCode();
    }
}

public static void Test2(int iterations)
{
    for (int index = 0; index < iterations; index++)
    {
        var kvp = new MyKeyValuePair<int, int>(index, index);
        kvp.GetHashCode();
    }
}

public struct MyKeyValuePair<TKey, TValue>
{
    private readonly TKey _Key;
    private readonly TValue _Value;

    public MyKeyValuePair(TKey key, TValue value)
    {
        _Key = key;
        _Value = value;
    }

    public TKey Key { get { return _Key; } }
    public TValue Value { get { return _Value; } }

    public int GetHashCode()
    {
        unchecked
        {
            int result = 37;

            result *= 23;
            if (Key != null)
                result += Key.GetHashCode();

            result *= 23;
            if (Value != null)
                result += Value.GetHashCode();

            return result;
        }
    }

    public override bool Equals(object obj)
    {
        if (obj == null) return false;
        if (obj.GetType() != typeof(MyKeyValuePair<TKey, TValue>))
            return false;

        var other = (MyKeyValuePair<TKey, TValue>)obj;
        return Equals(other.Key, Key) && Equals(other.Value, Value);
    }
}
like image 27
Lasse V. Karlsen Avatar answered Sep 29 '22 11:09

Lasse V. Karlsen