Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding collisions in hash table

I was reviewing for my data structures final exam, and I came across a question in past year's final. Having worked on it for the past three hours, I still couldn't figure out a way to solve it, except by trial and error. Here's the question:

"Suppose that the size of your hash table is 31, the constant g is also 31, and that you use the following hash code

int hash = 0;
int n = s.length();
for (int i = 0; i < n; i++)
   hash = g * hash + s.charAt(i);

and that you use separate chaining to resolve collisions. List five different names that would hash to the same location in the table."

I think there must be some kind of tricks, possibly involving the modulo operator, to solve this problem, considering the size of the hash table is 31, which is the same as the constant g. Sorry if I sound like it, but I'm not asking for code or anything, just some thoughts/hints on the question. Any comment is greatly appreciated. Thanks

like image 952
James Durbin Avatar asked May 17 '12 02:05

James Durbin


People also ask

How do you find a collision in a hash function?

For simple hash functions it is easy to reach a collision. For example, assume a hash function h(text) sums of all character codes in a text. It will produce the same hash value (collision) for texts holding the same letters in different order, i.e. h('abc') == h('cab') == h('bca') .

What are collisions in hash tables?

Definition: A collision occurs when more than one value to be hashed by a particular hash function hash to the same slot in the table or data structure (hash table) being generated by the hash function.

Do hash tables have collisions?

Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause hash collisions where the hash function generates the same index for more than one key. Such collisions are typically accommodated in some way.

Are there collisions in SHA256?

SHA256: The slowest, usually 60% slower than md5, and the longest generated hash (32 bytes). The probability of just two hashes accidentally colliding is approximately: 4.3*10-60. As you can see, the slower and longer the hash is, the more reliable it is.


2 Answers

java strings can contain a zero character ("\0"), so all the following would hash to the same value:

"a"
"\0a"
"\0\0a"
"\0\0\0a"
"\0\0\0\0a"

here's proof (thanks to Eric for the ref to this being the hash used):

> cat Foo.java
class Foo {
    public static void main(String[] args) {                                    
        System.out.println("a".hashCode());                                     
        System.out.println("\0a".hashCode());                                   
        System.out.println("\0a".length());
        System.out.println("\0a".equals("a"));
    }                                                                           
}           
> javac Foo.java                                        
> java Foo                                                       
97
97
2
false

i doubt this is the expected answer, though.

also, if this were an exam, i doubt i could remember ASCII codes. so an alternative approach to using sequences of the style in the other answer would be:

"\002\000"
"\001\037"

etc (these are octal triplets - the above both hash to 62). but is it easy to generate five examples (all same hash) for this style?

like image 131
andrew cooke Avatar answered Oct 19 '22 09:10

andrew cooke


According to the Wikipedia article on Java's string hashing algorithm (which is the same as the algorithm you present):

As with any general hashing function, collisions are possible. For example, the strings "FB" and "Ea" have the same hash value. The hashCode() implementation of String uses the prime number 31 and the difference between 'a' and 'B' is just 31, so the calculation is 70 × 31 + 66 = 69 × 31 + 97.

like image 5
Eric J. Avatar answered Oct 19 '22 08:10

Eric J.