I've been using STL's unordered_map recently and while it seems to work nicely I don't quite understand how the hashing function works given that the data type is given as a template parameter. In an effort to understand this data structure more thoroughly, I implemented my own little Hashmap class in C++ :
Hashmap interface:
#ifndef _HASHMAP_H_
#define _HASHMAP_H_
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <iostream>
#include <vector.h>
//Beginning of Hashmap class definition
template <class Key, class Value>
class Hashmap{
private:
int mappedElementCount;
public:
explicit Hashmap();
virtual ~Hashmap();
virtual void test();
virtual int hash(Key*);
int* getSize();
void putKVPair(Key*,Value*);
void clearMap();
//When we use these methods, we'll want a linear vector of keys and values to
//iterate over, so vector is good here
std::vector<Key>* getKeys();
std::vector<Value>* getValues();
}; //end hashmap class definition
#endif /*_HASHMAP_H_*/
Hashmap implementation:
#include "Hashmap.h"
template<class Key,class Value> Hashmap<Key,Value>::Hashmap(){
mappedElementCount = 0;
}
template<class Key,class Value> Hashmap<Key,Value>::~Hashmap(){
printf("\nDestroying the base Hashmap object!\n");
}
template<class Key,class Value> void Hashmap<Key,Value>::test(){
printf("The size of our Key is %i and the size of our Value is
%i\n",sizeof(Key),sizeof(Value));
}
template<class Key,class Value> int Hashmap<Key,Value>::hash(Key* k_ptr){
unsigned int hashval;
/* we start our hash out at 0 */
hashval = 0;
//TODO: How do we generate a hash signature when we don't know what data type
//we're going to be working with?
return hashval % mappedElementCount;
}
template<class Key,class Value> std::vector<Key>* Hashmap<Key,Value>::getKeys(){
//TODO: prepare a vector initialized with all Key objects and return it here
return keys;
}
template<class Key,class Value> std::vector<Value>* Hashmap<Key,Value>::getValues(){
//TODO: prepare a vector initialized with all Value objects and return it here
return values;
}
template<class Key,class Value> int* Hashmap<Key,Value>::getSize(){
return &mappedElementCount;
}
template<class Key,class Value> void Hashmap<Key,Value>::putKVPair(Key* k, Value* v){
//TODO: implement hashing of the key object k to determine
//the address of the value object v
//first step, generate a hash from our key
int tempHash = hash(k);
//TODO: store the Value at an address given by or influenced by tempHash
//If all was successfully completed, increment the mapped records counter
mappedElementCount++;
}
template<class Key,class Value> void Hashmap<Key,Value>::clearMap(){
//TODO: implement a cascading chain of deallocation of stored objects within the
//hashmap
//MAYBE-- only if we create new objects rather than just mapping reference
//associations,
//which is really the goal here... In the latter case, just empty the Hashmap
//itself
}
One possible OOP method of addressing this problem is to use Hashmap as a base class and provide derived classes that have known Key data types, such as the following Stringmap:
Stringmap interface:
#ifndef _STRINGMAP_H_
#define _STRINGMAP_H_
#include "Hashmap.h"
template <class Value>
class Stringmap:public Hashmap<std::string,Value>{
private:
public:
//Con/de 'structors
explicit Stringmap();
~Stringmap();
//Here we know our Key will be of type std::string
//so we can generate our hash sig by char values
//Override hash from the base class
int hash(std::string*);
//override test from base class
void test();
};
#endif /*_STRINGMAP_H_ def*/
Stringmap implementation:
#include "Stringmap.h"
template<class Value> Stringmap<Value>::Stringmap():Hashmap<std::string,Value>(){
}
template<class Value> Stringmap<Value>::~Stringmap(){
printf("\nDestroying the derived stringmap object!\n");
}
template<class Value> void Stringmap<Value>::test(){
printf("The size of our Value is %i\n",sizeof values[0]);
}
template<class Value> int Stringmap<Value>::hash(std::string* str_ptr){
unsigned int hashval;
/* we start our hash out at 0 */
hashval = 0;
/* for each character, we multiply the old hash by 31 and add the current
* character. Remember that shifting a number left is equivalent to
* multiplying it by 2 raised to the number of places shifted. So we
* are in effect multiplying hashval by 32 and then subtracting hashval.
* Why do we do this? Because shifting and subtraction are much more
* efficient operations than multiplication.
*/
for(int i=0;i<str_ptr->length();i++) {
hashval = (*(str_ptr))[i] + ((hashval << 5) - hashval);
}
/* we then return the hash value mod the hashmap size so that it will
* fit into the necessary range
*/
return hashval % (*(Hashmap<std::string,Value>::getSize()));
}
So the question is as follows: is it possible to create a hash signature when the data type to be hashed is currently unknown? If so, how? Looking at the std::hash docs, it appears that the C++ standard simply defines a hash function for each primitive data type and also for T* (for any type T)... What's missing is how this hashing is implemented for the given primitive data types and, more to the point, how it is implemented for the generic T*. I suppose I could just call hash(Key) and hope for the best, but it would be nice to understand what's going on behind the scenes.
thanks, CCJ
Simple Hash Map (Hash Table) Implementation in C++ Hash table (also, hash map) is a data structure that basically maps keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the corresponding value can be found.
A Hash Table in C/C++ (Associative array) is a data structure that maps keys to values. This uses a hash function to compute indexes for a key. Based on the Hash Table index, we can store the value at the appropriate location.
In C programming, since there is no advanced data structure, to use hash table or hashmap, we would have to implement them by ourselves. In C++ programming, fortunately, there are standard containers or abstractions, such as std::unordered_map and std::unordered_set , that have been implemented for us.
std::unorderd_map
takes 2 explicit template parameters (Key
, and Value
) and also has a bunch of hidden template parameters, of which the Hash function is defaulted to std::hash<Key>
.
This STL hash function std::hash<Key>
takes a Key
and returns a std::size_t
. It is already specialized for all the integral types and std::string
. From this reference site
The hash template defines a function object that implements a hash function. Instances of this function object define an operator() that:
- Accepts a single parameter of type Key.
- Returns a value of type size_t that represents the hash value of the parameter.
- Does not throw exceptions when called.
- For two parameters k1 and k2 that are equal, std::hash()(k1) == std::hash()(k2).
- For two different parameters k1 and k2 that are not equal, the probability that std::hash()(k1) == std::hash()(k2) should be very small, approaching 1.0/std::numeric_limits::max().
The hash template is both CopyConstructible and Destructible. The unordered associative containers std::unordered_set, std::unordered_multiset, std::unordered_map, std::unordered_multimap use specializations of the template std::hash as the default hash function.
The reference ends with this quote:
** The actual hash functions are implementation-dependent and are not required to fulfill any other quality criteria except those specified above. **
So you can look at the implementation of your system, but that won't guarantee anything for other systems' implementation.
There's a std::hash<T>
template which is specialized for a variety of types, and which you can specialize for your own types.
By default, std::unordered_map<T>
simply delegates hashing to std::hash<T>
(or you can specify a different hash function as a template argument).
Thus std::unordered_map
does not need to know anything at all about the mechanics of hashing.
As to how std::hash
is implemented, that's not specified. However, I think it's reasonable to assume that any decent compiler would provide a good-quality implementation. One gotcha to bear in mind is that std::hash<char*>
doesn't hash the C string, it only hashes the value of the pointer (been there :))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With