Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best 32bit hash function for short strings (tag names)?

What is the best 32bit hash function for relatively short strings?

Strings are tag names that consist of English letters, numbers, spaces and some additional characters (#, $, ., ...). For example: Unit testing, C# 2.0.

I am looking for 'best' as in 'minimal collisions', performance is not important for my goals.

like image 491
Andrey Shchekin Avatar asked Feb 28 '10 12:02

Andrey Shchekin


People also ask

What is a 32 bit hash?

The 32-bit long hash value is a hexadecimal number of 8 characters. MD4 is a Message Digest Algorithm developed by Ronald L. Rivest from RSA Data Security, Inc. Currently it's considered insecure, but it's very fast on 32-bit mashines and it's used for calculating EDonkey 2000 hashes in the EDonkey p2p network.

Which hash algorithm is best?

Probably the one most commonly used is SHA-256, which the National Institute of Standards and Technology (NIST) recommends using instead of MD5 or SHA-1. The SHA-256 algorithm returns hash value of 256-bits, or 64 hexadecimal digits.

What are some good hash functions?

A good hash function to use with integer key values is the mid-square method. The mid-square method squares the key value, and then takes out the middle r bits of the result, giving a value in the range 0 to 2r−1. This works well because most or all bits of the key value contribute to the result.

What is the most secure hash function?

Common attacks like brute force attacks can take years or even decades to crack the hash digest, so SHA-2 is considered the most secure hash algorithm.


1 Answers

I'm not sure if it's the best choice, but here is a hash function for strings:

The Practice of Programming (HASH TABLES, pg. 57)

/* hash: compute hash value of string */ unsigned int hash(char *str) {    unsigned int h;    unsigned char *p;     h = 0;    for (p = (unsigned char*)str; *p != '\0'; p++)       h = MULTIPLIER * h + *p;    return h; // or, h % ARRAY_SIZE; } 

Empirically, the values 31 and 37 have proven to be good choices for the multiplier in a hash function for ASCII strings.

like image 118
Nick Dandoulakis Avatar answered Oct 05 '22 17:10

Nick Dandoulakis