Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can hashcodes of short string be same?

I have short Strings (less than 10 characters). I will convert it into int and use it as primary key. (I can't use String primary key because of small problems.) I know the hashcodes of Strings of infinite length can collide, but can short Strings collide too?

like image 380
Cinakyn Avatar asked Sep 12 '14 01:09

Cinakyn


2 Answers

Absolutely yes. For example, Ea and FB are colliding strings, each only two characters in length! Example:

public static final void main(String[] args) {
    System.out.println("Ea".hashCode() + " " + "FB".hashCode());
}

Prints 2236 2236.


The Java String#hashCode function isn't really even close to random. It's really easy to generate collisions for short strings, and it doesn't fare much better on long strings.

In general, even if you stuck to just 6 bits per character (ASCII letters and numbers, and a couple of symbols), you'd exceed the possible values of the 32-bit hash code with only a 6-character string -- that is, you would absolutely guarantee collisions among the 2^36 6-character 6-bit strings.

like image 62
nneonneo Avatar answered Sep 27 '22 17:09

nneonneo


A hash code is 32 bits in size.

A char in Java is 16 bits in size.

So in theory, all 2-character strings could have different hash codes, although some of those hash codes would have to collide with the hash code of the empty string and single-character strings. So even taking "all strings of two characters or shorter" there will be collisions. By the time you've got 10 characters, there are way more possible strings than there are hash codes available.

Collisions are still likely to be rare, but you should always assume that they can happen.

like image 21
Jon Skeet Avatar answered Sep 27 '22 18:09

Jon Skeet