Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use String.hashCode to generate primary keys

I understand that this seems to be already discussed and the answer is yes, String.hashCode can generate equal vales for different strings, but quite unlikely (Can Java's hashCode produce same value for different strings?). However it does happen in my application.

The following code will produce the same hashcode: -347019262 (jave 1.7.25)

String string1 = "/m/06qw_";
String string2="/m/0859_";
System.out.println(string1+","+string1.hashCode());
System.out.println(string2+","+string2.hashCode());

I do need hashcode in this case, and I want to use it to generate a unique primary key for a string. it seems that I am not doing it right. Any suggestions please?

Many thanks!

like image 610
Ziqi Avatar asked Mar 11 '14 10:03

Ziqi


People also ask

Can 2 strings have same hashCode?

If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code. The hash code itself is not guaranteed to be stable.

How do you make a string hash?

Getting the hash code of a string is simple in C#. We use the GetHashCode() method. A hash code is a uniquely identified numerical value. Note that strings that have the same value have the same hash code.

How does hashCode () method work in Java?

Simply put, hashCode() returns an integer value, generated by a hashing algorithm. Objects that are equal (according to their equals()) must return the same hash code. Different objects do not need to return different hash codes.

What is a valid use of the hashCode () method?

hashCode in Java helps the program to run faster. For example, comparing two objects by their hashcodes will give the result 20 times faster than comparing them using the equals() function. This is so because hash data structures like HashMaps, internally organize the elements in an array-based data structure.


2 Answers

You misunderstand .hashCode().

One part of the contract is that objects who are equals() must have the same hashCode(). However, the reverse is not true: two objects who have the same hashCode() do not have to be equals().

This is a valid, albeit perfectly useless, hashCode() implementation:

@Override
public int hashCode()
{
    return 42; // universal answer
}

You should use the string itself as the "primary key". If you want a "more efficient" key, you should consider what format the input string is and, if possible, extract a significant part of this input.

like image 98
fge Avatar answered Nov 15 '22 18:11

fge


The sensible option is to use the string as the primary key. (Another choice would be to associate a GUID with your data record and have that as the primary key.)

Hashing is meant to be (1) fast and (2) such that two equal strings will have the same hash code.

I'd submit it's likely that you'll get hashing clashes; after all an int (the hash return type) only has about 4 billion distinct values.

like image 38
Bathsheba Avatar answered Nov 15 '22 19:11

Bathsheba