Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can 2 different string have the same hash code in C#? [duplicate]

Tags:

string

c#

Possible Duplicate:
what is hashCode use for? is it unique?

I am generating a lot of strings, then my question is:

Can 2 different string have the same hash code in C# ?

By hash code I mean:

string s = "Hello";
s.GetHashCode();

My question is more about of the algorithm that C# follow to geneate the strings, maybe the collisions come when all the other hash code are generated already or maybe not. Is possible that somebody have this answer.

like image 968
Eric Javier Hernandez Saura Avatar asked Oct 26 '12 19:10

Eric Javier Hernandez Saura


People also ask

Can two strings have the same hash code?

Two same strings/value must have the same hashcode, but the converse is not true. There might be another string which can match the same hash-code, so we can't derive the key using hash-code. The reason for two different string to have the same hash-code is due to the collision.

Can two different hashes be the same?

Two files can have the same MD5 hash even if there are different. As the MD5 algorithm can take an infinity of input and give a limited number of output, it's not impossible, even if the probability of collision is very low.

Is the hash of a string always the same?

Hashing works in one direction only – for a given piece of data, you'll always get the same hash BUT you can't turn a hash back into its original data. If you need to go in two directions, you need encrypting, rather than hashing.

Can two different images have the same hash?

A image hash function maps an image to a short string called image hash, and can be used for image authentication or as a digital fingerprint. Nevertheless, it can occur that two visually different images get the same image hash, which is called a collision.


3 Answers

Yes. Hash codes are not unique. There are 2^32 (4,294,967,296) possible hash codes (one for each integer value in a 32 bit integer). There are effectively an infinite number of possible strings. Clearly it's impossible for each of an infinite number of strings to have a different number of finite numbers.

Two different strings (or any values for that matter) having the same hash code is called a "collision". A good hashing algorithm will attempt to ensure that collisions are minimized to the greatest extent possible (although they can't be eliminated). Often this will be dependent on the actual types of data in practice; in this case of strings this means that strings that are similar, or of a similar size, should (ideally) be less prone to collisions.

I assume that you're asking because you're considering using a string's hash code as a unique identifier for a string. Don't do that.

Here is a link that goes into more detail about hash codes in general, if you're interested.

like image 57
Servy Avatar answered Sep 28 '22 09:09

Servy


In general you should expect a hash collision once you have as many elements as the square root of the size of the hash space http://en.wikipedia.org/wiki/Birthday_problem

For a 32 bit hash, you should expect your first collision around the 65k element. This is of course statistical, so you can't predict exactly when it will happen, but it's useful for intuition. If you have 10 strings, you probably don't need to worry about collisions, if you have 100k you definitely do.

like image 33
BostonJohn Avatar answered Sep 28 '22 09:09

BostonJohn


It depends on the hashing functions, and which algorithm it is using.

In general, some hashing techniques can map one input (the value you want to hash) to one output (the hashed value), while others may map two different inputs to the same output, the later is called Collision http://en.wikipedia.org/wiki/Collision_(computer_science)

For example, if a hash function codes 100 users' names to the numbers 0-9, we gonna have a lot of collisions.

Back to GetHashCode();

Refer to these two articles on MSDN:

http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/

This one explains the function, here is a quote from the bottom of it, check the first bullet:

GetHashCode is designed to do only one thing: balance a hash table. Do not use it for anything else. In particular:

  • It does not provide a unique key for an object; probability of collision is extremely high.
  • It is not of cryptographic strength, so do not use it as part of a digital signature or as a password equivalent
  • It does not necessarily have the error-detection properties needed for checksums.

Here is more explanation:

http://blogs.msdn.com/b/ericlippert/archive/2010/03/22/socks-birthdays-and-hash-collisions.aspx

like image 37
Life is more than this Avatar answered Sep 28 '22 10:09

Life is more than this