Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert DJB Hash To 64 Bit

Tags:

c++

c

hash

Would Dan Bernstein's hash function still function properly if I was using a 64 bit unsigned integer?

uint64
hash_djb2(register uchar *str, register size_t length) {
    register uint64 hash = 5381L;
    while (length--) {
        hash = ((hash << 5L) + hash) + *str++; /* hash * 33 + c */
    }
    return hash;
}
like image 877
Ian Avatar asked Dec 10 '22 04:12

Ian


1 Answers

The djb hash function is based on a Linear Congruential Generator, which has the form x = (a·x + c) mod m.

By inspecting the function, we realise that a = 33, c = input in the case of djb but the modulo is a bit hidden because it is stated by the type of the variable hash, unsigned long in it's original form, which contains 32 bits of data. When the value goes beyond the value of an unsigned long, it overflows and continues, thus a modulo of 2^32.

According to Knuth in his The Art of Computer Programming, Volume 2: Seminumerical Algorithms on chapter 3.2.1, m must be divisible by all prime factors of (a-1) for an Linear Congruential Generator to have a maximal period (period = modulo) (as well as other facts already taken into account by Mr. Bernstein). Since having a m = 2^64 introduces no new prime factor, since by definition 2 is a prime number of both 2^32 and 2^64, this rule is satisfied.

You would then, with this new hash algorithm, be able to get a period as long as your modulo, meaning you would cover all possible values of an 64 bit integer.

Please keep in mind though that altering any mathematical value of an algorithm can't be done lightly. A completely new statistical analysis is required to fully understand the flaws and benefits of your new algorithm, even if you only changed one variable of it. You can't simply change values and hope to get the same features as the original Linear Congruential Generator. Statistical characteristics such as entropy and collisions won't be retained from the former algorithm. You then can't claim to have implemented an djb algorithm, neither quote any statistical performance of djb to prove anything on your implementation.

like image 114
Soravux Avatar answered Dec 23 '22 20:12

Soravux