Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are hashCodes unique for Strings?

Tags:

java

hashcode

Recently, I came across a piece of code, where Map<Integer, String> is used, where Integer(key) is hashCode of some string and String value corresponding to that.

Is this right thing to do? Because now, equals will not be called for the String when calling get. (get is also done using hashCode() method on String object.

Or, hashCode(s) are unique for unique Strings?

I checked equals od String class. There is logic written for that. I am confused.

like image 569
codingenious Avatar asked Jun 01 '14 08:06

codingenious


People also ask

Do Hashcodes have to be unique?

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. Right. "As much as is reaonsably practical." So, in other words, "Don't assume it's unique, as it's not required to be." The hashCode() method returns an int.

Are hash codes unique C#?

A hash code is not an id, and it doesn't return a unique value. This is kind of obvious, when you think about it: GetHashCode returns an Int32 , which has “only” about 4.2 billion possible values, and there's potentially an infinity of different objects, so some of them are bound to have the same hash code.

Can two objects have same hashCode?

1) If two objects are equal (i.e. the equals() method returns true), they must have the same hashcode. 2) If the hashCode() method is called multiple times on the same object, it must return the same result every time. 3) Two different objects can have the same hash code.

Can hashCode be same in Java?

Java hashCode() An object hash code value can change in multiple executions of the same application. If two objects are equal according to equals() method, then their hash code must be same. If two objects are unequal according to equals() method, their hash code are not required to be different.


3 Answers

A HashMap does use equals() to compare keys. It only uses hashCode() to find the bucket where the key is located, and thus drastically reduce the number of keys to compare with equals().

Obviously, hashCode() can't produce unique values, since int is limited to 2^32 distinct values, and there are an infinity of possible String values.

In conclusion, the result of hashCode() is not a good fit for a key of a Map.

like image 127
JB Nizet Avatar answered Sep 22 '22 12:09

JB Nizet


No, they are not unique, eg "FB" and "Ea" both have hashCode = 2236

like image 43
Evgeniy Dorofeev Avatar answered Sep 19 '22 12:09

Evgeniy Dorofeev


This is not correct, as hashCode() will have collisions. As hashCode() is not well defined, it may have collisions for any pair of strings. So you cannot use that. If you require a unique pointer to a string then you can use a cryptographic hash as key:

The following code shows how to do this:

/**
 * Immutable class that represents a unique key for a string. This unique key
 * can be used as a key in a hash map, without the likelihood of a collision:
 * generating the same key for a different String. Note that you should first
 * check if keeping a key to a reference of a string is feasible, in that case a
 * {@link Set} may suffice.
 * <P>
 * This class utilizes SHA-512 to generate the keys and uses
 * {@link StandardCharsets#UTF_8} for the encoding of the strings. If a smaller
 * output size than 512 bits (64 bytes) is required then the leftmost bytes of
 * the SHA-512 hash are used. Smaller keys are therefore contained in with
 * larger keys over the same String value.
 * <P>
 * Note that it is not impossible to create collisions for key sizes up to 8-20
 * bytes.
 * 
 * @author owlstead
 */
public final class UniqueKey implements Serializable {

    public static final int MIN_DIGEST_SIZE_BYTES = 8;
    public static final int MAX_DIGEST_SIZE_BYTES = 64;

    /**
     * Creates a unique key for a string with the maximum size of 64 bytes.
     * 
     * @param input
     *            the input, not null
     * @return the generated instance
     */
    public static UniqueKey createUniqueKey(final CharSequence input) {
        return doCreateUniqueKey(input, MAX_DIGEST_SIZE_BYTES);
    }

    /**
     * Creates a unique key for a string with a size of 8 to 64 bytes.
     * 
     * @param input
     *            the input, not null
     * @param outputSizeBytes
     *            the output size
     * @return the generated instance
     */
    public static UniqueKey createUniqueKey(final CharSequence input,
            final int outputSizeBytes) {
        return doCreateUniqueKey(input, outputSizeBytes);
    }

    @Override
    public boolean equals(final Object obj) {
        if (!(obj instanceof UniqueKey)) {
            return false;
        }
        final UniqueKey that = (UniqueKey) obj;
        return ByteBuffer.wrap(this.key).equals(ByteBuffer.wrap(that.key));
    }

    @Override
    public int hashCode() {
        return ByteBuffer.wrap(this.key).hashCode();
    }

    /**
     * Outputs an - in itself - unique String representation of this key.
     * 
     * @return the string <CODE>"{key: [HEX ENCODED KEY]}"</CODE>
     */
    @Override
    public String toString() {
        // non-optimal but readable conversion to hexadecimal
        final StringBuilder sb = new StringBuilder(this.key.length * 2);
        sb.append("{Key: ");
        for (int i = 0; i < this.key.length; i++) {
            sb.append(String.format("%02X", this.key[i]));
        }
        sb.append("}");
        return sb.toString();
    }

    /**
     * Makes it possible to retrieve the underlying key data (e.g. to use a
     * different encoding).
     * 
     * @return the data in a read only ByteBuffer
     */
    public ByteBuffer asReadOnlyByteBuffer() {
        return ByteBuffer.wrap(this.key).asReadOnlyBuffer();
    }

    private static final long serialVersionUID = 1L;

    private static final int BUFFER_SIZE = 512;

    // byte array instead of ByteBuffer to support serialization
    private final byte[] key;

    private static UniqueKey doCreateUniqueKey(final CharSequence input,
            final int outputSizeBytes) {

        // --- setup digest

        final MessageDigest digestAlgorithm;
        try {
            // note: relatively fast on 64 bit systems (faster than SHA-256!)
            digestAlgorithm = MessageDigest.getInstance("SHA-512");
        } catch (final NoSuchAlgorithmException e) {
            throw new IllegalStateException(
                    "SHA-256 should always be avialable in a Java RE");
        }

        // --- validate input parameters

        if (outputSizeBytes < MIN_DIGEST_SIZE_BYTES
                || outputSizeBytes > MAX_DIGEST_SIZE_BYTES) {
            throw new IllegalArgumentException(
                    "Unique key size either too small or too big");
        }

        // --- setup loop

        final CharsetEncoder encoder = StandardCharsets.UTF_8.newEncoder();
        final CharBuffer buffer = CharBuffer.wrap(input);
        final ByteBuffer encodedBuffer = ByteBuffer.allocate(BUFFER_SIZE);
        CoderResult coderResult;

        // --- loop over all characters
        // (instead of encoding everything to byte[] at once - peak memory!)

        while (buffer.hasRemaining()) {
            coderResult = encoder.encode(buffer, encodedBuffer, false);
            if (coderResult.isError()) {
                throw new IllegalArgumentException(
                        "Invalid code point in input string");
            }
            encodedBuffer.flip();
            digestAlgorithm.update(encodedBuffer);
            encodedBuffer.clear();
        }

        coderResult = encoder.encode(buffer, encodedBuffer, true);
        if (coderResult.isError()) {
            throw new IllegalArgumentException(
                    "Invalid code point in input string");
        }
        encodedBuffer.flip();
        digestAlgorithm.update(encodedBuffer);
        // no need to clear encodedBuffer if generated locally

        // --- resize result if required

        final byte[] digest = digestAlgorithm.digest();
        final byte[] result;
        if (outputSizeBytes == digest.length) {
            result = digest;
        } else {
            result = Arrays.copyOf(digest, outputSizeBytes);
        }

        // --- and return the final, possibly resized, result

        return new UniqueKey(result);
    }

    private UniqueKey(final byte[] key) {
        this.key = key;
    }
}
like image 30
Maarten Bodewes Avatar answered Sep 19 '22 12:09

Maarten Bodewes