Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Appropriate hashing function to hash random binary strings

i have an two arrays : char data1[length] where length is a multiple of 8 i.e length can be 8, 16,24 ... The array contains binary data read from a file that is open in binary mode. I will keep reading from the file and everytime i read i will store the read value in a hash table. The disterbution of this binary data has a random distribution. I would like to hash each array and store them in a hash table in order to be able to look for the char with the specific data again. What would be a good hashing function to achive this task. Thanks

Please note that i am writing this in c++ and c so any language you choose to provide a solution for would be great.

like image 820
Mike G Avatar asked Nov 05 '11 06:11

Mike G


2 Answers

If the data that you read is 8 bytes long and really distributed randomly, and your hashcode needs to be 32 bits, what about this:

uint32_t hashcode(const unsigned char *data) {
  uint32_t hash = 0;
  hash ^= get_uint32_le(data + 0);
  hash ^= get_uint32_le(data + 4);
  return hash;
}

uint32_t get_uint32_le(const unsigned char *data) {
  uint32_t value = 0;
  value |= data[0] << 0;
  value |= data[1] << 8;
  value |= data[2] << 16;
  value |= data[3] << 24;
  return value;
}

If you need more speed, this code can probably made a lot faster if you can guarantee that data is always properly aligned to be interpreted as an const uint32_t *.

like image 86
Roland Illig Avatar answered Nov 09 '22 19:11

Roland Illig


I have successfully used MurmurHash3 in one of my projects.

Pros:

  • It is fast. Very fast.
  • It supposedly has a low collision rate.

Cons:

  • It's not suitable for cryptography applications.
  • It's not standardized in any shape or form.
  • It's not portable to non-x86 platforms. However, it's small enough that you should be able to port it if you really need to - I was able to port it to Java, although that's not nearly the same thing.

It's a good possibility for use in e.g. a fast hash-table implementation...

like image 36
thkala Avatar answered Nov 09 '22 21:11

thkala