Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Array as Dictionary key gives a lot of collisions

Tags:

c#

dictionary

I need to use a list of numbers (longs) as a Dictionary key in order to do some group calculations on them.

When using the long array as a key directly, I get a lot of collisions. If I use string.Join(",", myLongs) as a key, it works as I would expect it to, but that's much, much slower (because the hash is more complicated, I assume).

Here's an example demonstrating my problem:

Console.WriteLine("Int32");
Console.WriteLine(new[] { 1, 2, 3, 0}.GetHashCode());
Console.WriteLine(new[] { 1, 2, 3, 0 }.GetHashCode());

Console.WriteLine("String");
Console.WriteLine(string.Join(",", new[] { 1, 2, 3, 0}).GetHashCode());
Console.WriteLine(string.Join(",", new[] { 1, 2, 3, 0 }).GetHashCode());

Output:

Int32
43124074
51601393
String
406954194
406954194

As you can see, the arrays return a different hash.

Is there any way of getting the performance of the long array hash, but the uniqeness of the string hash?

See my own answer below for a performance comparison of all the suggestions.

About the potential duplicate -- that question has a lot of useful information, but as this question was primarily about finding high performance alternatives, I think it still provides some useful solutions that are not mentioned there.

like image 328
Erlend D. Avatar asked Dec 18 '22 16:12

Erlend D.


1 Answers

That the first one is different is actually good. Arrays are a reference type and luckily they are using the reference (somehow) during hash generation. I would guess that is something like the Pointer that is used on machine code level, or some Garbage Colletor level value. One of the things you have no influence on but is copied if you assign the same instance to a new reference variable.

In the 2nd case you get the hash value on a string consisting of "," and whatever (new[] { 1, 2, 3, 0 }).ToString(); should return. The default is something like teh class name, so of course in both cases they will be the same. And of course string has all those funny special rules like "compares like a value type" and "string interning", so the hash should be the same.

like image 90
Christopher Avatar answered Dec 20 '22 05:12

Christopher