Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hashtables (Dictionary etc) with integer keys

I've been puzzling over this for a few days... feel free to shoot down any of my assumptions.

We're using a Dictionary with integer keys. I assume that the value of the key in this case is used directly as the hash. Does this mean (if the keys are grouped over a small range) that the distribution of the key hash (same as the key itself, right?) will be in a similarly small range, and therefore a bad choice for a hashtable?

Would it be better to provide an IEqualityComparer that did something clever with primes and modulo mathematics to calculate a better distributed hash?

like image 807
spender Avatar asked Sep 07 '09 08:09

spender


People also ask

Which is better hash table or dictionary?

Dictionary is faster than hashtable as dictionary is a generic strong type. Hashtable is slower as it takes object as data type which leads to boxing and unboxing.

Which is faster dictionary or Hashtable?

Dictionary is a generic type and returns an error if you try to find a key which is not there. The Dictionary collection is faster than Hashtable because there is no boxing and unboxing.

Is hash table same as dictionary?

A dictionary is a data structure that maps keys to values. A hash table is a data structure that maps keys to values by taking the hash value of the key (by applying some hash function to it) and mapping that to a bucket where one or more values are stored.

What can be used as a key in a hash table?

Any non-null object can be used as a key or as a value. To successfully store and retrieve objects from a hashtable, the objects used as keys must implement the hashCode method and the equals method. It is similar to HashMap, but is synchronized. Hashtable stores key/value pair in hash table.


2 Answers

It's not used directly in that the dictionary will still ask the key for its hash - but the hash value of an Int32 is just the value, so the thrust of your question is relevant, yes.

I believe that the way the .NET dictionary works doesn't rely on hash values being uniformly distributed. It takes hash % bucketCount where bucketCount is always prime. (That's from memory though - I could be wrong.)

You could still end up with an inefficient set of keys of course, if they happen to be spaced by the bucket count. That will always be the case though - a hash table would only ever be genuinely O(1) for all keys if they had unique hash values and the table maintained a set of buckets for every possible hash :) In reality it tends not to be a problem. If you happen to know that it will be a problem, then yes, a custom IEqualityComparer<T> could help.

like image 117
Jon Skeet Avatar answered Oct 03 '22 00:10

Jon Skeet


Before doing something clever I'd test the speed of it as-is, and see if it's suitable for you. If it isn't, then try the clever thing. But I would expect it's better to leave it alone; it's more important that the hashes don't collide, and as long as that's happening, life will be fine.

like image 42
Noon Silk Avatar answered Oct 03 '22 00:10

Noon Silk