Is this a correct implementation of the Knuth multiplicative hash. <pre class="prettyprint"><code>int hash(int v) { v *= 2654435761; return v >> 32; } </code></pre> Does overflow in the multiplication affects the algorithm? How to improve the performance of this method?

Might be late, but heres a Java Implementation of Knuth's Method : For a hashtable of Size N : <pre class="prettyprint"><code>public long hash(int key) { long l = 2654435769L; return (key * l >> 32) % N ; } </code></pre>

knuth multiplicative hash

Tags:

c++

algorithm

hash

Is this a correct implementation of the Knuth multiplicative hash.

int hash(int v)
{
    v *= 2654435761;
    return v >> 32;
}

Does overflow in the multiplication affects the algorithm?

How to improve the performance of this method?

419

asked Aug 08 '12 18:08

José

2 Answers

Knuth multiplicative hash is used to compute an hash value in {0, 1, 2, ..., 2^p - 1} from an integer k.

Suppose that p is in between 0 and 32, the algorithm goes like this:

Compute alpha as the closest integer to 2^32 (-1 + sqrt(5)) / 2. We get alpha = 2 654 435 769.
Compute k * alpha and reduce the result modulo 2^32:

k * alpha = n0 * 2^32 + n1 with 0 <= n1 < 2^32
Keep the highest p bits of n1:

n1 = m1 * 2^(32-p) + m2 with 0 <= m2 < 2^(32 - p)

So, a correct implementation of Knuth multiplicative algorithm in C++ is:

std::uint32_t knuth(int x, int p) {
    assert(p >= 0 && p <= 32);

    const std::uint32_t knuth = 2654435769;
    const std::uint32_t y = x;
    return (y * knuth) >> (32 - p);
}

Forgetting to shift the result by (32 - p) is a major mistake. As you would lost all the good properties of the hash. It would transform an even sequence into an even sequence which would be very bad as all the odd slots would stay unoccupied. That's like taking a good wine and mixing it with Coke. By the way, the web is full of people misquoting Knuth and using a multiplication by 2 654 435 761 without taking the higher bits. I just opened the Knuth and he never said such a thing. It looks like some guy who decided he was "smart" decided to take a prime number close to 2 654 435 769.

Bare in mind that most hash tables implementations don't allow this kind of signature in their interface, as they only allow

uint32_t hash(int x);

and reduce hash(x) modulo 2^p to compute the hash value for x. Those hash tables cannot accept the Knuth multiplicative hash. This might be a reason why so many people completely ruined the algorithm by forgetting to take the higher p bits. So you can't use the Knuth multiplicative hash with std::unordered_map or std::unordered_set. But I think that those hash tables use a prime number as a size, so the Knuth multiplicative hash is not useful in this case. Using hash(x) = x would be a good fit for those tables.

Source: "Introduction to Algorithms, third edition", Cormen et al., 13.3.2 p:263

Source: "The Art of Computer Programming, Volume 3, Sorting and Searching", D.E. Knuth, 6.4 p:516

116

answered Oct 02 '22 04:10

InsideLoop

Might be late, but heres a Java Implementation of Knuth's Method :

For a hashtable of Size N :

public long hash(int key) {
    long l = 2654435769L;
    return (key * l >> 32) % N ;
}

answered Oct 02 '22 04:10

because_im_batman

Related questions
                            
                                error: there are no arguments to 'at' that depend on a template parameter, so a declaration of at must be available
                            
                                Delete pointer and object
                            
                                STL non-copying wrapper around an existing array?
                            
                                When is "extern C " necessary in c++ in windows?
                            
                                Clean way to convert QString to char * (not const char* !!!!)
                            
                                How often do you check for an exception in a C++ new instruction?
                            
                                Replace C style comments by C++ style comments
                            
                                Calculating e^x without using any functions
                            
                                Learning C++ from scratch in Visual Studio? [closed]
                            
                                Is it a good practice to always create a .cpp for each .h in a C++ project?
                            
                                How to compare two objects (the calling object and the parameter) in a class?
                            
                                std::list iterator: get next element
                            
                                What does "%3d" mean in a printf statement?
                            
                                Is Visual C++ as powerful as gcc?
                            
                                Is it possible to pin a dll in memory to prevent unloading?
                            
                                c++ dynamic size of the array
                            
                                C++ integral constants + choice operator = problem!
                            
                                Why is a c++ reference considered safer than a pointer?
                            
                                static const char * - defined but not used
                            
                                Using newly declared variable in initialization (int x = x+1)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With