Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash attack: find strings of length 2^N with same hashCode()

Reading algorithms fourth edition by Robert Sedgewick and Kevin Wayne I found the following question:

Hash attack: find 2^N strings, each of length 2^N, that have the same hashCode() value, supposing that the hashCode() implementation for String is the following:

public int hashCode() {
   int hash = 0;
   for (int i = 0; i < length(); i++)
      hash = (hash * 31) + charAt(i);
   return hash;
}

Strong hint: Aa and BB have the same value.

What comes on my mind is generating all possible strings of length 2^N and compare their hashCodes. This, however, is very expensive for large N and I doubt it's the correct solution. Can you give me hints what I miss in the whole picture?

like image 956
John Avatar asked Jan 31 '26 13:01

John


2 Answers

Andreas' and Glains' answers are both correct, but they aren't quite what you need if your goal is to produce 2N distinct strings of length 2N.

Rather, a simpler approach is to build strings consisting solely of concatenated sequences of Aa and BB. For length 2×1 you have { Aa, BB }; for length 2×2 you have { AaAa, AaBB, BBAa, BBBB }, for length 2×3 you have { AaAaAa, AAaaBB, AaBBAa, AaBBBB, BBAaAa, BBAaBB, BBBBAa, BBBBBB }; and so on.

(Note: you've quoted the text as saying the strings should have length 2N. I'm guessing that you misquoted, and it's actually asking for length 2N; but if it is indeed asking for length 2N, then you can simply drop elements as you proceed.)

like image 152
ruakh Avatar answered Feb 03 '26 01:02

ruakh


"Strong hint" explained.

Strong hint: Aa and BB have the same value.

In ASCII / Unicode, B has a value 1 higher than A. Since those are the second last characters, the value is multiplied by 31, so hash code is increased by 31 when you change xxxxAa to xxxxBa.

To offset that, you need last character to offset by -31. Since lowercase letters are 32 higher than uppercase letters, changing a to A is -32 and changing one letter up to B is then -31.

So, it get same hash code, change second-last letter to next letter (e.g. A to B), and change last letter from lowercase to next uppercase (e.g. a to B).

You can now use that hint to generate up to 26 strings with the same hash code.

like image 25
Andreas Avatar answered Feb 03 '26 02:02

Andreas