Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A minimal hash function for C?

Tags:

c

hashtable

hash

I can't use boost:hash because I have to stick with C and can't use C++.

But, I need to hash a large number (10K to 100k) of tokens strings (5 to 40 bytes length) so that search within those are fastest.

MD5, SHA1 or any long hash function seems too heavy for a simple task, I am not doing cryptography. Plus there is the storage and computing cost.

Therefore my question:

  1. What might be the simplest hash algorithm that will ensure collision prevention in most practical cases.

  2. How many bit to use for the hash value? I am developing for 32 bit systems. Does hash algorithm in Perl/Python use 32 bit hashes too? Or do I have to jump to 64?

  3. Regarding implementation of hash tables in common scripting languages: does the implementation check for collisions or can I avoid that part altogether?

like image 322
CDR Avatar asked Apr 13 '09 13:04

CDR


People also ask

What is a minimal hash function?

A minimal perfect hash function is a perfect hash function that maps n keys to n consecutive integers – usually the numbers from 0 to n − 1 or from 1 to n. A more formal way of expressing this is: Let j and k be elements of some finite set S.

Is there a hash function in C?

A Hash Table in C/C++ (Associative array) is a data structure that maps keys to values. This uses a hash function to compute indexes for a key. Based on the Hash Table index, we can store the value at the appropriate location.

What are 2 common hash functions?

The most common hash functions used in digital forensics are Message Digest 5 (MD5), and Secure Hashing Algorithm (SHA) 1 and 2.

What is the slowest hashing function?

I believe bcrypt is the slowest hashing algorithm currently available and is why it is most commonly recommended for hashing passwords.


1 Answers

You can find a good (and fast) hash function, and an interesting read, at http://www.azillionmonkeys.com/qed/hash.html

The only time you should not check for collisions, is if you use a perfect hash -- a good old fashioned lookup table, like gperf.

like image 163
gnud Avatar answered Sep 27 '22 17:09

gnud