Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is String.GetHashCode() implemented differently in 32-bit and 64-bit versions of the CLR?

Tags:

What are the technical reasons behind the difference between the 32-bit and 64-bit versions of string.GetHashCode()?

More importantly, why does the 64-bit version seem to terminate its algorithm when it encounters the NUL character? For example, the following expressions all return true when run under the 64-bit CLR.

"\0123456789".GetHashCode() == "\0987654321".GetHashCode() "\0AAAAAAAAA".GetHashCode() == "\0BBBBBBBBB".GetHashCode() "\0The".GetHashCode() == "\0Game".GetHashCode() 

This behavior (bug?) manifested as a performance issue when we used such strings as keys in a Dictionary.

like image 337
Ilian Avatar asked Jul 25 '11 08:07

Ilian


People also ask

Is GetHashCode unique?

NO! A hash code is not an id, and it doesn't return a unique value. This is kind of obvious, when you think about it: GetHashCode returns an Int32 , which has “only” about 4.2 billion possible values, and there's potentially an infinity of different objects, so some of them are bound to have the same hash code.

Is string GetHashCode deterministic?

The key point is that the hash codes are deterministic for a given program execution, that means the only time it'll be an issue is if you're saving the hash code outside of a process, and loading it into another one.

How do I hash a string in C#?

Getting the hash code of a string is simple in C#. We use the GetHashCode() method. A hash code is a uniquely identified numerical value. Note that strings that have the same value have the same hash code.


2 Answers

This looks like a known issue which Microsoft would not fix:

As you have mentioned this would be a breaking change for some programs (even though they shouldn't really be relying on this), the risk of this was deemed too high to fix this in the current release.

I agree that the rate of collisions that this will cause in the default Dictionary<String, Object> will be inflated by this. If this is adversely effecting your applications performance, I would suggest trying to work around it by using one of the Dictionary constructors that takes an IEqualityComparer so you can provide a more appropriate GetHashCode implementation. I know this isn't ideal and would like to get this fixed in a future version of the .NET Framework.

Source: Microsoft Connect - String.GetHashCode ignores any characters in the string beyond the first null byte in x64 runtime

like image 79
Kobi Avatar answered Sep 25 '22 13:09

Kobi


Eric lippert has got a wondeful blog to this Curious property in String

Curious property Revealed

like image 41
Ashley John Avatar answered Sep 22 '22 13:09

Ashley John