Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a non-empty string have a hashcode of zero?

By "non-empty", I mean in this question a string which contains at least one non-zero character.

For reference, here's the hashCode implementation :

1493    public int hashCode() {
1494        int h = hash;
1495        if (h == 0) {
1496            int off = offset;
1497            char val[] = value;
1498            int len = count;
1499
1500            for (int i = 0; i < len; i++) {
1501                h = 31*h + val[off++];
1502            }
1503            hash = h;
1504        }
1505        return h;
1506    }

and the algorithm is specified in the documentation.

Before an integer overflow occurs, the answer is easy: it's no. But what I'd like to know is if, due to integer overflow, it's possible for a non-empty string to have a hashcode of zero? Can you construct one?

What I'm looking for would ideally be a mathematical demonstration (or a link to one) or a construction algorithm.

like image 223
Denys Séguret Avatar asked Sep 11 '13 16:09

Denys Séguret


People also ask

Can you hash an empty string?

HashCode of empty string will be zero. In above code, 'hash' is a class level private int and 'value' is a char[] to get all characters of string. If a string does not contain any characters or string is empty, it will return 'h' as default value of integer which is Zero.

Is it possible that hashCode is not unique?

Unique, no. By nature, hash values are not guaranteed to be unique. Any system with an arbitrarily large number of possible inputs and a limited number of outputs will have collisions.

Does string override hashCode?

The String's hashCode() overrides the Object. hashCode() method. This method returns the hashcode as an integer value.

Can hashCode of two strings be same in Java?

Two same strings/value must have the same hashcode, but the converse is not true. There might be another string which can match the same hash-code, so we can't derive the key using hash-code. The reason for two different string to have the same hash-code is due to the collision.


3 Answers

Sure. The string f5a5a608 for example has a hashcode of zero.

I found that through a simple brute force search:

public static void main(String[] args){
    long i = 0;
    loop: while(true){
        String s = Long.toHexString(i);
        if(s.hashCode() == 0){
            System.out.println("Found: '"+s+"'");
            break loop;
        }
        if(i % 1000000==0){
            System.out.println("checked: "+i);              
        }
        i++;
    }       
}

Edit: Joseph Darcy, who worked on the JVM, even wrote a program that can construct a string with a given hashcode (to test the implementation of Strings in switch/case statements) by basically running the hash algorithm in reverse.

like image 110
Michael Borgwardt Avatar answered Oct 10 '22 19:10

Michael Borgwardt


just be care of that int h;. It may overflow, every string that satisfy h % 2^31 == 0 may lead to this.

public class HelloWorld {
    public static void main(String []args) {
       System.out.println("\u0001!qbygvW".hashCode());
        System.out.println("9 $Ql(0".hashCode());
        System.out.println(" #t(}lrl".hashCode());
        System.out.println(" !!#jbw}a".hashCode());
        System.out.println(" !!#jbw|||".hashCode());
        System.out.println(" !!!!Se|aaJ".hashCode());
        System.out.println(" !!!!\"xurlls".hashCode());
    }
}

A lot of strings...

like image 34
Manasseh Zhou Avatar answered Oct 10 '22 21:10

Manasseh Zhou


Here's code to find and print strings of any desired hashCode value:

public static int findIntInverse(int x) {
    // find the number y such that as an int (after overflow) x*y = 1
    // assumes x is odd, because without that it isn't possible.
    // works by computing x ** ((2 ** 32) - 1)
    int retval = 1;
    for (int i = 0; i < 31; i++) {
        retval *= retval;
        retval *= x;
    }
    return retval;
}

public static void findStrings(
        int targetHash,
        Iterable<String> firstParts,
        Iterable<String> midParts,
        Iterable<String> lastParts) {
    Map<Integer, String> firstHashes = new HashMap<>();
    for (String firstPart : firstParts) {
        firstHashes.put(firstPart.hashCode(), firstPart);
    }
    int maxlastlen = 0;
    int maxmidlen = 0;
    for (String midPart : midParts) {
        maxmidlen = Math.max(midPart.length(), maxmidlen);
    }
    for (String lastPart : lastParts) {
        maxlastlen = Math.max(lastPart.length(), maxlastlen);
    }
    List<Integer> hashmuls = new ArrayList<>();
    String baseStr = "\u0001";
    for (int i = 0; i <= maxmidlen + maxlastlen; i++) {
        hashmuls.add(baseStr.hashCode());
        baseStr += "\0";
    }
    // now change each hashmuls into its negative "reciprocal"
    for (int i = 0; i < hashmuls.size(); i++) {
        hashmuls.set(i, -findIntInverse(hashmuls.get(i)));
    }
    for (String lastPart : lastParts) {
        for (String midPart : midParts) {
            String tail = midPart + lastPart;
            Integer target = hashmuls.get(tail.length()) * (tail.hashCode() - targetHash);
            if (firstHashes.containsKey(target)) {
                System.out.print(firstHashes.get(target));
                System.out.println(tail);
            }
        }
    }
}

Some interesting finds found by using a list of common English words to seed each part:

sand nearby chair
king concentration feeling
childhood dish tight
war defensive to
ear account virus

Using just Arrays.asList(" ") as midParts and a large English word list for firstParts and lastParts, we find the well-known pollinating sandboxes as well as revolvingly admissable, laccaic dephase, toxity fizzes, etc.

Note that if you give findStrings a large list of size N for both firstParts and lastParts and a short fixed list for midParts, it runs in O(N) time.

like image 1
Daniel Martin Avatar answered Oct 10 '22 19:10

Daniel Martin