Hash function for a string

Tags:

We are currently dealing with hash function in my class. Our instructor asked us to a hash function on the internet to compare to the two we have used in our code.

The first one:

Click to copy

int HashTable::hash (string word)    // POST: the index of entry is returned {       int sum = 0;         for (int k = 0; k < word.length(); k++)             sum = sum + int(word[k]);         return  sum % SIZE;  }

Second:

Click to copy

int HashTable::hash (string word) {    int seed = 131;     unsigned long hash = 0;    for(int i = 0; i < word.length(); i++)    {       hash = (hash * seed) + word[i];    }    return hash % SIZE; }

Where SIZE is 501 (The size of the hash table) and the input is coming from a text file of 20,000+ words.

I saw this question with a few code examples but wasn't exactly sure what to be looking for in a hash function. If I understand correctly, in my case, a hash takes an input (string) and does a math calculation to assign the string a number and inserts it in a table. This process is done to increase the speed of searching the list?

If my logic is sound, does anyone have a good example or a resource showing a different hash function that involves a string? Or even the process of writing my own efficient hash function.

568

asked Nov 29 '11 20:11

Nick

1 Answers

First, it usually does not matter that much in practice. Most hash functions are "good enough".

But if you really care, you should know that it is a research subject by itself. There are thousand of papers about that. You can still get a PhD today by studying & designing hashing algorithms.

Your second hash function might be slightly better, because it probably should separate the string "ab" from the string "ba". On the other hand, it is probably less quick than the first hash function. It may, or may not, be relevant for your application.

I'll guess that hash functions used for genome strings are quite different than those used to hash family names in telephone databases. Perhaps even some string hash functions are better suited for German, than for English or French words.

Many software libraries give you good enough hash functions, e.g. Qt has qhash, and C++11 has std::hash in <functional>, Glib has several hash functions in C, and POCO has some hash function.

I quite often have hashing functions involving primes (see Bézout's identity) and xor, like e.g.

Click to copy

#define A 54059 /* a prime */ #define B 76963 /* another prime */ #define C 86969 /* yet another prime */ #define FIRSTH 37 /* also prime */ unsigned hash_str(const char* s) {    unsigned h = FIRSTH;    while (*s) {      h = (h * A) ^ (s[0] * B);      s++;    }    return h; // or return h % C; }

But I don't claim to be an hash expert. Of course, the values of A, B, C, FIRSTH should preferably be primes, but you could have chosen other prime numbers.

Look at some MD5 implementation to get a feeling of what hash functions can be.

Most good books on algorithmics have at least a whole chapter dedicated to hashing. Start with wikipages on hash function & hash table.

188

answered Oct 07 '22 10:10

Basile Starynkevitch

Related questions
                            
                                "Step over" when debugging multithreaded programs in Visual Studio
                            
                                map of vectors in STL?
                            
                                How can I convert an Int to a CString?
                            
                                Online compilers/runtime for Java, C++, Python and ObjC? [closed]
                            
                                Why can't I construct a queue/stack with brace-enclosed initializer lists? (C++11)
                            
                                How to declare and initialize a static const array as a class member?
                            
                                How to strip all non alphanumeric characters from a string in c++?
                            
                                Error with EXPECT_EQ for sum of double or float
                            
                                Self-sufficient header files in C/C++
                            
                                C++ Passing Pointer to Function (Howto) + C++ Pointer Manipulation
                            
                                Why don't people indent C++ access specifiers/case statements?
                            
                                Logic used behind Array Manipulation of HackerRank
                            
                                Why can't I increment a variable of an enumerated type?
                            
                                stack object Qt signal and parameter as reference
                            
                                Eclipse c++ Type could not be resolved error even though build is successful
                            
                                What is the purpose of specifying captured variable in lambda expression?
                            
                                Reading every nth frame from VideoCapture in OpenCV
                            
                                Eclipse: C/C++ Plugin Download Link?
                            
                                Readability a=b=c or a=c; b=c;?
                            
                                How to compile a c++ program in Linux?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hash function for a string

Tags:

c++

string

hashtable

hash

Nick

People also ask

1 Answers

Basile Starynkevitch

Recent Activity

Donate For Us