Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String to Integer Hashing Function with Precision

Tags:

c++

hash

I want to hash a char array in to an int or a long. The resulting value has to adhere to a given precision value. The function I've been using is given below:

int GetHash(const char* zKey, int iPrecision /*= 6*/)
{
        /////FROM : http://courses.cs.vt.edu/~cs2604/spring02/Projects/4/elfhash.cpp

        unsigned long h = 0;
        long M = pow(10, iPrecision);

        while(*zKey)
        {
                h = (h << 4) + *zKey++;
                unsigned long g = h & 0xF0000000L;
                if (g) h ^= g >> 24;
                h &= ~g;
        }            

        return (int) (h % M);
}

The string to be hashed is similar to "SAEUI1210.00000010_1".

However, this produces duplicate values in some cases. Are there any good alternatives which wouldn't duplicate the same hash for different string values.

like image 451
Gayan Avatar asked Dec 03 '22 07:12

Gayan


2 Answers

The very definition of a hash is that it produces duplicate values for some values, due to hash value range being smaller than the space of the hashed data.

In theory, a 32-bit hash has enough range to hash all ~6 character strings (A-Z,a-z,0-9 only), without causing a collision. In practice, hashes are not a perfect permutation of the input. Given a 32-bit hash, you can expect to get hash collisions after hashing ~16 bit of random inputs, due to the birthday paradox.

Given a static set of data values, it's always possible to construct a hash function designed specifically for them, which will never collide with itself (of course, size of its output will be at least log(|data set|). However, it requires you to know all the possible data values ahead of time. This is called perfect hashing.

That being said, here are a few alternatives which should get you started (they are designed to minimize collisions)

like image 194
ASk Avatar answered Dec 21 '22 10:12

ASk


Every hash will have collisions. Period. That's called a Birthday Problem.

You may want to check cryptographic has functions like MD5 (relatively fast and you don't care that it's insecure) but it also will have collisions.

like image 28
sharptooth Avatar answered Dec 21 '22 12:12

sharptooth