I was reviewing for my data structures final exam, and I came across a question in past year's final. Having worked on it for the past three hours, I still couldn't figure out a way to solve it, except by trial and error. Here's the question:
"Suppose that the size of your hash table is 31, the constant g is also 31, and that you use the following hash code
int hash = 0;
int n = s.length();
for (int i = 0; i < n; i++)
hash = g * hash + s.charAt(i);
and that you use separate chaining to resolve collisions. List five different names that would hash to the same location in the table."
I think there must be some kind of tricks, possibly involving the modulo operator, to solve this problem, considering the size of the hash table is 31, which is the same as the constant g. Sorry if I sound like it, but I'm not asking for code or anything, just some thoughts/hints on the question. Any comment is greatly appreciated. Thanks
For simple hash functions it is easy to reach a collision. For example, assume a hash function h(text) sums of all character codes in a text. It will produce the same hash value (collision) for texts holding the same letters in different order, i.e. h('abc') == h('cab') == h('bca') .
Definition: A collision occurs when more than one value to be hashed by a particular hash function hash to the same slot in the table or data structure (hash table) being generated by the hash function.
Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause hash collisions where the hash function generates the same index for more than one key. Such collisions are typically accommodated in some way.
SHA256: The slowest, usually 60% slower than md5, and the longest generated hash (32 bytes). The probability of just two hashes accidentally colliding is approximately: 4.3*10-60. As you can see, the slower and longer the hash is, the more reliable it is.
java strings can contain a zero character ("\0"
), so all the following would hash to the same value:
"a"
"\0a"
"\0\0a"
"\0\0\0a"
"\0\0\0\0a"
here's proof (thanks to Eric for the ref to this being the hash used):
> cat Foo.java
class Foo {
public static void main(String[] args) {
System.out.println("a".hashCode());
System.out.println("\0a".hashCode());
System.out.println("\0a".length());
System.out.println("\0a".equals("a"));
}
}
> javac Foo.java
> java Foo
97
97
2
false
i doubt this is the expected answer, though.
also, if this were an exam, i doubt i could remember ASCII codes. so an alternative approach to using sequences of the style in the other answer would be:
"\002\000"
"\001\037"
etc (these are octal triplets - the above both hash to 62). but is it easy to generate five examples (all same hash) for this style?
According to the Wikipedia article on Java's string hashing algorithm (which is the same as the algorithm you present):
As with any general hashing function, collisions are possible. For example, the strings "FB" and "Ea" have the same hash value. The hashCode() implementation of String uses the prime number 31 and the difference between 'a' and 'B' is just 31, so the calculation is 70 × 31 + 66 = 69 × 31 + 97.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With