Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# String.getHashCode() returning same value for different strings [duplicate]

My application is running as a windows service and I'm attaching VS2013 to it's process to debug. I'm getting the hash code for the content of image files to check for differences with the following method (within a static class):

static class FileUtils
{
    public static int GetFileHash(string filePath)
    {
        int hash = 0;
        Logger.WriteLog(ToolTipIcon.Info, "Calculating hash code for {0}", filePath);
        StreamReader sr = new StreamReader(filePath, Encoding.Unicode);
        hash = sr.ReadToEnd().GetHashCode();
        sr.Close();
        return hash;
    }
}

Which has been working fine in production. However, this method will always return 2074746262 for two different images. I've tried to reproduce this in a winforms app with the same code and images and I can't. Is there something with debugging a process in VS2013 that would cause this behavior? I've replaced one of the images with an entirely different image, but it still happens.

like image 849
Phil Avatar asked Feb 12 '26 01:02

Phil


2 Answers

First of all, you should be aware that you are using GetHashCode incorrectly, for two reasons:

  1. Hash codes are not unique, there are merely very well distributed. There are a finite number of hash codes and an infinite number of binary strings, so it is physically impossible to generate a unique hash code per string.

  2. The details of the hash code algorithm are explicitly not documented, and will change for reasons that seem irrelevant to you. In particular, this is not the first time I've seen it reported that string.GetHashCode() changes behavior when running under a debugger:

string.GetHashCode() returns different values in debug vs release, how do I avoid this?


Having said that, it seems a bit unusual that three different binary strings would hash differently in the same run-time environment just depending on having a debugger attached. Other than generally not trusting GetHashCode as you are, my next guess is that you're not hashing what you think you're hashing. I would dump the binary data itself to disk before hashing it, and confirm that you really do have different binary strings.

like image 135
Michael Edenfield Avatar answered Feb 14 '26 14:02

Michael Edenfield


Documentation explicitly calls this out. Don't rely on String.GetHashCode to be unique. Your assumption is wrong.

If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code.

like image 37
Sriram Sakthivel Avatar answered Feb 14 '26 16:02

Sriram Sakthivel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!