Recently, I came across a piece of code, where <code>Map<Integer, String></code> is used, where <code>Integer</code>(key) is <code>hashCode</code> of some string and <code>String</code> value corresponding to that. Is this right thing to do? Because now, <code>equals</code> will not be called for the <code>String</code> when calling <code>get</code>. (<code>get</code> is also done using <code>hashCode()</code> method on String object. Or, hashCode(s) are unique for unique Strings? I checked <code>equals</code> od <code>String</code> class. There is logic written for that. I am confused.

A HashMap does use <code>equals()</code> to compare keys. It only uses <code>hashCode()</code> to find the bucket where the key is located, and thus drastically reduce the number of keys to compare with <code>equals()</code>. Obviously, <code>hashCode()</code> can't produce unique values, since int is limited to 2^32 distinct values, and there are an infinity of possible String values. In conclusion, the result of <code>hashCode()</code> is not a good fit for a key of a <code>Map</code>.

Are hashCodes unique for Strings?

3 Answers

A HashMap does use equals() to compare keys. It only uses hashCode() to find the bucket where the key is located, and thus drastically reduce the number of keys to compare with equals().

Obviously, hashCode() can't produce unique values, since int is limited to 2^32 distinct values, and there are an infinity of possible String values.

In conclusion, the result of hashCode() is not a good fit for a key of a Map.

127

answered Sep 22 '22 12:09

JB Nizet

No, they are not unique, eg "FB" and "Ea" both have hashCode = 2236

answered Sep 19 '22 12:09

Evgeniy Dorofeev

This is not correct, as hashCode() will have collisions. As hashCode() is not well defined, it may have collisions for any pair of strings. So you cannot use that. If you require a unique pointer to a string then you can use a cryptographic hash as key:

The following code shows how to do this:

Click to copy

/**
 * Immutable class that represents a unique key for a string. This unique key
 * can be used as a key in a hash map, without the likelihood of a collision:
 * generating the same key for a different String. Note that you should first
 * check if keeping a key to a reference of a string is feasible, in that case a
 * {@link Set} may suffice.
 * <P>
 * This class utilizes SHA-512 to generate the keys and uses
 * {@link StandardCharsets#UTF_8} for the encoding of the strings. If a smaller
 * output size than 512 bits (64 bytes) is required then the leftmost bytes of
 * the SHA-512 hash are used. Smaller keys are therefore contained in with
 * larger keys over the same String value.
 * <P>
 * Note that it is not impossible to create collisions for key sizes up to 8-20
 * bytes.
 * 
 * @author owlstead
 */
public final class UniqueKey implements Serializable {

    public static final int MIN_DIGEST_SIZE_BYTES = 8;
    public static final int MAX_DIGEST_SIZE_BYTES = 64;

    /**
     * Creates a unique key for a string with the maximum size of 64 bytes.
     * 
     * @param input
     *            the input, not null
     * @return the generated instance
     */
    public static UniqueKey createUniqueKey(final CharSequence input) {
        return doCreateUniqueKey(input, MAX_DIGEST_SIZE_BYTES);
    }

    /**
     * Creates a unique key for a string with a size of 8 to 64 bytes.
     * 
     * @param input
     *            the input, not null
     * @param outputSizeBytes
     *            the output size
     * @return the generated instance
     */
    public static UniqueKey createUniqueKey(final CharSequence input,
            final int outputSizeBytes) {
        return doCreateUniqueKey(input, outputSizeBytes);
    }

    @Override
    public boolean equals(final Object obj) {
        if (!(obj instanceof UniqueKey)) {
            return false;
        }
        final UniqueKey that = (UniqueKey) obj;
        return ByteBuffer.wrap(this.key).equals(ByteBuffer.wrap(that.key));
    }

    @Override
    public int hashCode() {
        return ByteBuffer.wrap(this.key).hashCode();
    }

    /**
     * Outputs an - in itself - unique String representation of this key.
     * 
     * @return the string <CODE>"{key: [HEX ENCODED KEY]}"</CODE>
     */
    @Override
    public String toString() {
        // non-optimal but readable conversion to hexadecimal
        final StringBuilder sb = new StringBuilder(this.key.length * 2);
        sb.append("{Key: ");
        for (int i = 0; i < this.key.length; i++) {
            sb.append(String.format("%02X", this.key[i]));
        }
        sb.append("}");
        return sb.toString();
    }

    /**
     * Makes it possible to retrieve the underlying key data (e.g. to use a
     * different encoding).
     * 
     * @return the data in a read only ByteBuffer
     */
    public ByteBuffer asReadOnlyByteBuffer() {
        return ByteBuffer.wrap(this.key).asReadOnlyBuffer();
    }

    private static final long serialVersionUID = 1L;

    private static final int BUFFER_SIZE = 512;

    // byte array instead of ByteBuffer to support serialization
    private final byte[] key;

    private static UniqueKey doCreateUniqueKey(final CharSequence input,
            final int outputSizeBytes) {

        // --- setup digest

        final MessageDigest digestAlgorithm;
        try {
            // note: relatively fast on 64 bit systems (faster than SHA-256!)
            digestAlgorithm = MessageDigest.getInstance("SHA-512");
        } catch (final NoSuchAlgorithmException e) {
            throw new IllegalStateException(
                    "SHA-256 should always be avialable in a Java RE");
        }

        // --- validate input parameters

        if (outputSizeBytes < MIN_DIGEST_SIZE_BYTES
                || outputSizeBytes > MAX_DIGEST_SIZE_BYTES) {
            throw new IllegalArgumentException(
                    "Unique key size either too small or too big");
        }

        // --- setup loop

        final CharsetEncoder encoder = StandardCharsets.UTF_8.newEncoder();
        final CharBuffer buffer = CharBuffer.wrap(input);
        final ByteBuffer encodedBuffer = ByteBuffer.allocate(BUFFER_SIZE);
        CoderResult coderResult;

        // --- loop over all characters
        // (instead of encoding everything to byte[] at once - peak memory!)

        while (buffer.hasRemaining()) {
            coderResult = encoder.encode(buffer, encodedBuffer, false);
            if (coderResult.isError()) {
                throw new IllegalArgumentException(
                        "Invalid code point in input string");
            }
            encodedBuffer.flip();
            digestAlgorithm.update(encodedBuffer);
            encodedBuffer.clear();
        }

        coderResult = encoder.encode(buffer, encodedBuffer, true);
        if (coderResult.isError()) {
            throw new IllegalArgumentException(
                    "Invalid code point in input string");
        }
        encodedBuffer.flip();
        digestAlgorithm.update(encodedBuffer);
        // no need to clear encodedBuffer if generated locally

        // --- resize result if required

        final byte[] digest = digestAlgorithm.digest();
        final byte[] result;
        if (outputSizeBytes == digest.length) {
            result = digest;
        } else {
            result = Arrays.copyOf(digest, outputSizeBytes);
        }

        // --- and return the final, possibly resized, result

        return new UniqueKey(result);
    }

    private UniqueKey(final byte[] key) {
        this.key = key;
    }
}

answered Sep 19 '22 12:09

Maarten Bodewes

Related questions
                            
                                Why it's possible to instantiate String and not possible for Number(Long,Double,Integer...)?
                            
                                JDBC driver cannot be loaded. Why? (Spring,Hibernate)
                            
                                What is the difference between a = a.trim() and a.trim()?
                            
                                How does the XOR (^) swap algorithm work?
                            
                                The fastest method of determining if a string is a palindrome
                            
                                Implementing Multilevel Java Interfaces in Scala
                            
                                JSTL Count the ForEach loop
                            
                                Send push notification GCM by java [closed]
                            
                                NumberFormatException while parsing large numbers?
                            
                                Why is String.Format("%1s","") not returning "" but " "?
                            
                                Why does using one "print" instead of three work?
                            
                                Singleton object- In static block or in getInstance(); which should be used
                            
                                Error - "java: cannot find symbol" in IntelliJ IDEA
                            
                                Debug jar file in eclipse
                            
                                Effect of "finally" block on return values from "try" block [duplicate]
                            
                                How do I add an image inside a rectangle or a circle in JavaFX?
                            
                                Regular Expression for a string that contains one or more letters somewhere in it
                            
                                How to create an empty array?
                            
                                Is there any difference between these two loops?
                            
                                Error setting property values; nested exception is org.springframework.beans.NotWritablePropertyException:

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Are hashCodes unique for Strings?

Tags:

java

hashcode

codingenious

People also ask

3 Answers

JB Nizet

Evgeniy Dorofeev

Maarten Bodewes

Recent Activity

Donate For Us