To quote from Guidelines and rules for GetHashCode by Eric Lippert: <blockquote> Rule: Consumers of GetHashCode cannot rely upon it being stable over time or across appdomains Suppose you have a Customer object that has a bunch of fields like Name, Address, and so on. If you make two such objects with exactly the same data in two different processes, they do not have to return the same hash code. If you make such an object on Tuesday in one process, shut it down, and run the program again on Wednesday, the hash codes can be different. This has bitten people in the past. The documentation for System.String.GetHashCode notes specifically that two identical strings can have different hash codes in different versions of the CLR, and in fact they do. Don't store string hashes in databases and expect them to be the same forever, because they won't be. </blockquote> So what is the correct way to create a HashCode of a string that I can store in a database? (Please tell me I am not the first person to have left this bug in software I have written!)

It depends what properties you want that hash to have. For example, you could just write something like this: <pre class="prettyprint"><code>public int HashString(string text) { // TODO: Determine nullity policy. unchecked { int hash = 23; foreach (char c in text) { hash = hash * 31 + c; } return hash; } } </code></pre> So long as you document that that is how the hash is computed, that's valid. It's in no way cryptographically secure or anything like that, but you can persist it with no problems. Two strings which are absolutely equal in the ordinal sense (i.e. with no cultural equality etc applied, exactly character-by-character the same) will produce the same hash with this code. The problems come when you rely on undocumented hashing - i.e. something which obeys <code>GetHashCode()</code> but is in no way guaranteed to remain the same from version to version... like <code>string.GetHashCode()</code>. Writing and documenting your own hash like this is a bit like saying, "This sensitive information is hashed with MD5 (or whatever)". So long as it's a well-defined hash, that's fine. EDIT: Other answers have suggested using cryptographic hashes such as SHA-1 or MD5. I would say that until we know there's a requirement for cryptographic security rather than just stability, there's no point in going through the rigmarole of converting the string to a byte array and hashing that. Of course if the hash is meant to be used for anything security-related, an industry-standard hash is exactly what you should be reaching for. But that wasn't mentioned anywhere in the question.

How do I create a HashCode in .net (c#) for a string that is safe to store in a database?

Tags:

c#

.net

database

hashcode

gethashcode

To quote from Guidelines and rules for GetHashCode by Eric Lippert:

Rule: Consumers of GetHashCode cannot rely upon it being stable over time or across appdomains

Suppose you have a Customer object that has a bunch of fields like Name, Address, and so on. If you make two such objects with exactly the same data in two different processes, they do not have to return the same hash code. If you make such an object on Tuesday in one process, shut it down, and run the program again on Wednesday, the hash codes can be different.

This has bitten people in the past. The documentation for System.String.GetHashCode notes specifically that two identical strings can have different hash codes in different versions of the CLR, and in fact they do. Don't store string hashes in databases and expect them to be the same forever, because they won't be.

So what is the correct way to create a HashCode of a string that I can store in a database?

(Please tell me I am not the first person to have left this bug in software I have written!)

650

asked Mar 01 '11 13:03

Ian Ringrose

1 Answers

It depends what properties you want that hash to have. For example, you could just write something like this:

public int HashString(string text) {     // TODO: Determine nullity policy.      unchecked     {         int hash = 23;         foreach (char c in text)         {             hash = hash * 31 + c;         }         return hash;     } }

So long as you document that that is how the hash is computed, that's valid. It's in no way cryptographically secure or anything like that, but you can persist it with no problems. Two strings which are absolutely equal in the ordinal sense (i.e. with no cultural equality etc applied, exactly character-by-character the same) will produce the same hash with this code.

The problems come when you rely on undocumented hashing - i.e. something which obeys GetHashCode() but is in no way guaranteed to remain the same from version to version... like string.GetHashCode().

Writing and documenting your own hash like this is a bit like saying, "This sensitive information is hashed with MD5 (or whatever)". So long as it's a well-defined hash, that's fine.

EDIT: Other answers have suggested using cryptographic hashes such as SHA-1 or MD5. I would say that until we know there's a requirement for cryptographic security rather than just stability, there's no point in going through the rigmarole of converting the string to a byte array and hashing that. Of course if the hash is meant to be used for anything security-related, an industry-standard hash is exactly what you should be reaching for. But that wasn't mentioned anywhere in the question.

answered Oct 04 '22 14:10

Jon Skeet

Related questions
                            
                                Is EndInvoke() optional, sort-of optional, or definitely not optional?
                            
                                Encoding parameters for a URL
                            
                                Calling a C# library from python
                            
                                How can I make a TextBox be a "password box" and display stars when using MVVM?
                            
                                What data type should I use to represent money in C#?
                            
                                Can Events be declared as Static, if yes how and why
                            
                                What is wwwroot in asp.net vnext
                            
                                Performance - using Guid object or Guid string as Key
                            
                                GZipStream or DeflateStream class?
                            
                                How to test if MethodInfo.ReturnType is type of System.Void?
                            
                                How can one generate and save a file client side using Blazor?
                            
                                Are C# arrays thread safe?
                            
                                Entity Framework include with left join is this possible?
                            
                                Why does the main thread's output come first in C#?
                            
                                What does Collection.Contains() use to check for existing objects?
                            
                                Why can't control leave a finally statement?
                            
                                Turn on/off monitor
                            
                                WinApi - GetLastError vs. Marshal.GetLastWin32Error
                            
                                Accepted style for long vs Int64 in C#? [closed]
                            
                                Finding out if a type implements a generic interface

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With