By "non-empty", I mean in this question a string which contains at least one non-zero character.
For reference, here's the hashCode
implementation :
1493 public int hashCode() {
1494 int h = hash;
1495 if (h == 0) {
1496 int off = offset;
1497 char val[] = value;
1498 int len = count;
1499
1500 for (int i = 0; i < len; i++) {
1501 h = 31*h + val[off++];
1502 }
1503 hash = h;
1504 }
1505 return h;
1506 }
and the algorithm is specified in the documentation.
Before an integer overflow occurs, the answer is easy: it's no. But what I'd like to know is if, due to integer overflow, it's possible for a non-empty string to have a hashcode of zero? Can you construct one?
What I'm looking for would ideally be a mathematical demonstration (or a link to one) or a construction algorithm.
HashCode of empty string will be zero. In above code, 'hash' is a class level private int and 'value' is a char[] to get all characters of string. If a string does not contain any characters or string is empty, it will return 'h' as default value of integer which is Zero.
Unique, no. By nature, hash values are not guaranteed to be unique. Any system with an arbitrarily large number of possible inputs and a limited number of outputs will have collisions.
The String's hashCode() overrides the Object. hashCode() method. This method returns the hashcode as an integer value.
Two same strings/value must have the same hashcode, but the converse is not true. There might be another string which can match the same hash-code, so we can't derive the key using hash-code. The reason for two different string to have the same hash-code is due to the collision.
Sure. The string f5a5a608 for example has a hashcode of zero.
I found that through a simple brute force search:
public static void main(String[] args){
long i = 0;
loop: while(true){
String s = Long.toHexString(i);
if(s.hashCode() == 0){
System.out.println("Found: '"+s+"'");
break loop;
}
if(i % 1000000==0){
System.out.println("checked: "+i);
}
i++;
}
}
Edit: Joseph Darcy, who worked on the JVM, even wrote a program that can construct a string with a given hashcode (to test the implementation of Strings in switch/case statements) by basically running the hash algorithm in reverse.
just be care of that int h;
. It may overflow, every string that satisfy h % 2^31 == 0
may lead to this.
public class HelloWorld {
public static void main(String []args) {
System.out.println("\u0001!qbygvW".hashCode());
System.out.println("9 $Ql(0".hashCode());
System.out.println(" #t(}lrl".hashCode());
System.out.println(" !!#jbw}a".hashCode());
System.out.println(" !!#jbw|||".hashCode());
System.out.println(" !!!!Se|aaJ".hashCode());
System.out.println(" !!!!\"xurlls".hashCode());
}
}
A lot of strings...
Here's code to find and print strings of any desired hashCode
value:
public static int findIntInverse(int x) {
// find the number y such that as an int (after overflow) x*y = 1
// assumes x is odd, because without that it isn't possible.
// works by computing x ** ((2 ** 32) - 1)
int retval = 1;
for (int i = 0; i < 31; i++) {
retval *= retval;
retval *= x;
}
return retval;
}
public static void findStrings(
int targetHash,
Iterable<String> firstParts,
Iterable<String> midParts,
Iterable<String> lastParts) {
Map<Integer, String> firstHashes = new HashMap<>();
for (String firstPart : firstParts) {
firstHashes.put(firstPart.hashCode(), firstPart);
}
int maxlastlen = 0;
int maxmidlen = 0;
for (String midPart : midParts) {
maxmidlen = Math.max(midPart.length(), maxmidlen);
}
for (String lastPart : lastParts) {
maxlastlen = Math.max(lastPart.length(), maxlastlen);
}
List<Integer> hashmuls = new ArrayList<>();
String baseStr = "\u0001";
for (int i = 0; i <= maxmidlen + maxlastlen; i++) {
hashmuls.add(baseStr.hashCode());
baseStr += "\0";
}
// now change each hashmuls into its negative "reciprocal"
for (int i = 0; i < hashmuls.size(); i++) {
hashmuls.set(i, -findIntInverse(hashmuls.get(i)));
}
for (String lastPart : lastParts) {
for (String midPart : midParts) {
String tail = midPart + lastPart;
Integer target = hashmuls.get(tail.length()) * (tail.hashCode() - targetHash);
if (firstHashes.containsKey(target)) {
System.out.print(firstHashes.get(target));
System.out.println(tail);
}
}
}
}
Some interesting finds found by using a list of common English words to seed each part:
sand nearby chair
king concentration feeling
childhood dish tight
war defensive to
ear account virus
Using just Arrays.asList(" ")
as midParts
and a large English word list for firstParts
and lastParts
, we find the well-known pollinating sandboxes
as well as revolvingly admissable
, laccaic dephase
, toxity fizzes
, etc.
Note that if you give findStrings
a large list of size N for both firstParts
and lastParts
and a short fixed list for midParts
, it runs in O(N) time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With