Recently, I came across a piece of code, where Map<Integer, String>
is used, where Integer
(key) is hashCode
of some string and String
value corresponding to that.
Is this right thing to do? Because now, equals
will not be called for the String
when calling get
. (get
is also done using hashCode()
method on String object.
Or, hashCode(s) are unique for unique Strings?
I checked equals
od String
class. There is logic written for that. I am confused.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. Right. "As much as is reaonsably practical." So, in other words, "Don't assume it's unique, as it's not required to be." The hashCode() method returns an int.
A hash code is not an id, and it doesn't return a unique value. This is kind of obvious, when you think about it: GetHashCode returns an Int32 , which has “only” about 4.2 billion possible values, and there's potentially an infinity of different objects, so some of them are bound to have the same hash code.
1) If two objects are equal (i.e. the equals() method returns true), they must have the same hashcode. 2) If the hashCode() method is called multiple times on the same object, it must return the same result every time. 3) Two different objects can have the same hash code.
Java hashCode() An object hash code value can change in multiple executions of the same application. If two objects are equal according to equals() method, then their hash code must be same. If two objects are unequal according to equals() method, their hash code are not required to be different.
A HashMap does use equals()
to compare keys. It only uses hashCode()
to find the bucket where the key is located, and thus drastically reduce the number of keys to compare with equals()
.
Obviously, hashCode()
can't produce unique values, since int is limited to 2^32 distinct values, and there are an infinity of possible String values.
In conclusion, the result of hashCode()
is not a good fit for a key of a Map
.
No, they are not unique, eg "FB" and "Ea" both have hashCode = 2236
This is not correct, as hashCode()
will have collisions. As hashCode()
is not well defined, it may have collisions for any pair of strings. So you cannot use that. If you require a unique pointer to a string then you can use a cryptographic hash as key:
The following code shows how to do this:
/**
* Immutable class that represents a unique key for a string. This unique key
* can be used as a key in a hash map, without the likelihood of a collision:
* generating the same key for a different String. Note that you should first
* check if keeping a key to a reference of a string is feasible, in that case a
* {@link Set} may suffice.
* <P>
* This class utilizes SHA-512 to generate the keys and uses
* {@link StandardCharsets#UTF_8} for the encoding of the strings. If a smaller
* output size than 512 bits (64 bytes) is required then the leftmost bytes of
* the SHA-512 hash are used. Smaller keys are therefore contained in with
* larger keys over the same String value.
* <P>
* Note that it is not impossible to create collisions for key sizes up to 8-20
* bytes.
*
* @author owlstead
*/
public final class UniqueKey implements Serializable {
public static final int MIN_DIGEST_SIZE_BYTES = 8;
public static final int MAX_DIGEST_SIZE_BYTES = 64;
/**
* Creates a unique key for a string with the maximum size of 64 bytes.
*
* @param input
* the input, not null
* @return the generated instance
*/
public static UniqueKey createUniqueKey(final CharSequence input) {
return doCreateUniqueKey(input, MAX_DIGEST_SIZE_BYTES);
}
/**
* Creates a unique key for a string with a size of 8 to 64 bytes.
*
* @param input
* the input, not null
* @param outputSizeBytes
* the output size
* @return the generated instance
*/
public static UniqueKey createUniqueKey(final CharSequence input,
final int outputSizeBytes) {
return doCreateUniqueKey(input, outputSizeBytes);
}
@Override
public boolean equals(final Object obj) {
if (!(obj instanceof UniqueKey)) {
return false;
}
final UniqueKey that = (UniqueKey) obj;
return ByteBuffer.wrap(this.key).equals(ByteBuffer.wrap(that.key));
}
@Override
public int hashCode() {
return ByteBuffer.wrap(this.key).hashCode();
}
/**
* Outputs an - in itself - unique String representation of this key.
*
* @return the string <CODE>"{key: [HEX ENCODED KEY]}"</CODE>
*/
@Override
public String toString() {
// non-optimal but readable conversion to hexadecimal
final StringBuilder sb = new StringBuilder(this.key.length * 2);
sb.append("{Key: ");
for (int i = 0; i < this.key.length; i++) {
sb.append(String.format("%02X", this.key[i]));
}
sb.append("}");
return sb.toString();
}
/**
* Makes it possible to retrieve the underlying key data (e.g. to use a
* different encoding).
*
* @return the data in a read only ByteBuffer
*/
public ByteBuffer asReadOnlyByteBuffer() {
return ByteBuffer.wrap(this.key).asReadOnlyBuffer();
}
private static final long serialVersionUID = 1L;
private static final int BUFFER_SIZE = 512;
// byte array instead of ByteBuffer to support serialization
private final byte[] key;
private static UniqueKey doCreateUniqueKey(final CharSequence input,
final int outputSizeBytes) {
// --- setup digest
final MessageDigest digestAlgorithm;
try {
// note: relatively fast on 64 bit systems (faster than SHA-256!)
digestAlgorithm = MessageDigest.getInstance("SHA-512");
} catch (final NoSuchAlgorithmException e) {
throw new IllegalStateException(
"SHA-256 should always be avialable in a Java RE");
}
// --- validate input parameters
if (outputSizeBytes < MIN_DIGEST_SIZE_BYTES
|| outputSizeBytes > MAX_DIGEST_SIZE_BYTES) {
throw new IllegalArgumentException(
"Unique key size either too small or too big");
}
// --- setup loop
final CharsetEncoder encoder = StandardCharsets.UTF_8.newEncoder();
final CharBuffer buffer = CharBuffer.wrap(input);
final ByteBuffer encodedBuffer = ByteBuffer.allocate(BUFFER_SIZE);
CoderResult coderResult;
// --- loop over all characters
// (instead of encoding everything to byte[] at once - peak memory!)
while (buffer.hasRemaining()) {
coderResult = encoder.encode(buffer, encodedBuffer, false);
if (coderResult.isError()) {
throw new IllegalArgumentException(
"Invalid code point in input string");
}
encodedBuffer.flip();
digestAlgorithm.update(encodedBuffer);
encodedBuffer.clear();
}
coderResult = encoder.encode(buffer, encodedBuffer, true);
if (coderResult.isError()) {
throw new IllegalArgumentException(
"Invalid code point in input string");
}
encodedBuffer.flip();
digestAlgorithm.update(encodedBuffer);
// no need to clear encodedBuffer if generated locally
// --- resize result if required
final byte[] digest = digestAlgorithm.digest();
final byte[] result;
if (outputSizeBytes == digest.length) {
result = digest;
} else {
result = Arrays.copyOf(digest, outputSizeBytes);
}
// --- and return the final, possibly resized, result
return new UniqueKey(result);
}
private UniqueKey(final byte[] key) {
this.key = key;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With