Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Isn't 'int GetHashCode' a bit short-sighted?

Tags:

.net

hashcode

Given that .Net has the ability to detect bitness via IntPtr (looking through reflector a good amount of it is marked unsafe, though - shame) I've been thinking that GetHashCode returning an int is potentially short-sighted.

I know that ultimately with a good hashing algorithm the billions of permutations offered by Int32 are absolutely adequate, but even so, the narrower the possible set of hashes the slower hashed key lookups are as more linear searching will be required.

Equally - am I the only one who finds this amusing:

struct Int64{
  public override int GetHashCode()
  {
    return (((int) this) ^ ((int) (this >> 0x20)));
  }
}

Whilst Int32 simply returns this.

If IntPtr is out of question because of performance concerns, perhaps an IHashCode that implements IEquatable etc is better?

As our platforms get larger and larger in terms of memory capacity, disk size etc, surely the days of 32 bit hashes being enough are potentially numbered?

Or is it simply the case that the overhead involved in either abstracting out the hash via interfaces, or adapting the size of the hash according to the platform outweighs any potential performance benefits?

like image 615
Andras Zoltan Avatar asked Jan 14 '10 14:01

Andras Zoltan


People also ask

What is GetHashCode?

The GetHashCode method provides this hash code for algorithms that need quick checks of object equality. Note. For information about how hash codes are used in hash tables and for some additional hash code algorithms, see the Hash Function entry in Wikipedia. Two objects that are equal return hash codes that are equal.

What is the return type for System object GetHashCode?

The GetHashCode method provides this hash code for algorithms that need quick checks of object equality. Syntax: public virtual int GetHashCode (); Return Value: This method returns a 32-bit signed integer hash code for the current object.

Is Hashcode unique in C#?

A hash code is not an id, and it doesn't return a unique value. This is kind of obvious, when you think about it: GetHashCode returns an Int32 , which has “only” about 4.2 billion possible values, and there's potentially an infinity of different objects, so some of them are bound to have the same hash code.

How HashCode is generated in c#?

GetHashCode() method is used to get the hash code of the specified string. When you apply this method to the string this method will return a 32-bit signed integer hash code of the given string. Syntax: public override int GetHashCode ();


1 Answers

The Int64 hash function is there to make sure that all the bits are considered - so basically it is XORing the top 32 bits with the bottom 32 bits. I can't really imagine a better general-purpose one. (Truncating to Int32 would be no good - how could you then properly hash 64-bit values which had all zeros in the lower 32 bits?)

If IntPtr were used as the hash return value, then code would have to have conditional branches (is it 32-bit? is it 64-bit? etc), which would slow down the hash functions, defeating the whole point.

I would say that if you have a hashtable which actually has 2 billion buckets, you're probably at the stage of writing an entire custom system anyway. (Possibly a database would be a better choice?) At that size, making sure the buckets were filled evenly would be a more pressing concern. (In other words, a better hash function would probably pay more dividends than a larger number of buckets).

There would be nothing to stop you implementing a base class which did have an equivalent 64-bit hash function, if you did want a multi-gigabyte map in memory. You'd have to write your own Dictionary equivalent however.

like image 112
stusmith Avatar answered Oct 04 '22 16:10

stusmith