Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generic Hash function for all STL-containers

Tags:

c++

c++11

hash

map

stl

I'm using an std::unordered_map<key,value> in my implementation. i will be using any of the STL containers as the key. I was wondering if it is possible to create a generic hash function for any container being used.

This question in SO offers generic print function for all STL containers. While you can have that, why cant you have something like a Hash function that defines everything ? And yeah, a big concern is also that it needs to fast and efficient.

I was considering doing a simple hash function that converts the values of the key to a size_t and do a simple function like this.

Can this be done ?

PS : Please don't use boost libraries. Thanks.

like image 908
0x0 Avatar asked Aug 01 '11 13:08

0x0


People also ask

What hash function is used in STL?

The unordered_map::hash_function() is a built in function in C++ STL which is used to get the hash function. This hash function is a unary function which takes a single argument only and returns a unique value of type size_t based on it.

What is a generic hash function?

The generic HashSet<T> class is an unordered collection for containing unique elements. A hash function is an algorithm that returns a numeric hash code based on a key. The key is the value of some property of the object being stored. A hash function must always return the same hash code for the same key.

What do all STL containers define?

An STL container is a collection of objects of the same type (the elements). Container owns the elements. Creation and destruction is controlled by the container.

Which of the following are STL containers types?

The three types of containers found in the STL are sequential, associative and unordered.


1 Answers

We can get an answer by mimicking Boost and combining hashes.

Warning: Combining hashes, i.e. computing a hash of many things from many hashes of the things, is not a good idea generally, since the resulting hash function is not "good" in the statistical sense. A proper hash of many things should be build from the entire raw data of all the constituents, not from intermediate hashes. But there currently isn't a good standard way of doing this.

Anyway:

First off, we need the hash_combine function. For reasons beyond my understanding it's not been included in the standard library, but it's the centrepiece for everything else:

template <class T>
inline void hash_combine(std::size_t & seed, const T & v)
{
  std::hash<T> hasher;
  seed ^= hasher(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}

Using this, we can hash everything that's made up from hashable elements, in particular pairs and tuples (exercise for the reader).

However, we can also use this to hash containers by hashing their elements. This is precisely what Boost's "range hash" does, but it's straight-forward to make that yourself by using the combine function.

Once you're done writing your range hasher, just specialize std::hash and you're good to go:

namespace std
{
  template <typename T, class Comp, class Alloc>
  struct hash<std::set<T, Comp, Alloc>>
  {
    inline std::size_t operator()(const std::set<T, Comp, Alloc> & s) const
    {
      return my_range_hash(s.begin(), s.end());
    }
  };

  /* ... ditto for other containers */
}

If you want to mimic the pretty printer, you could even do something more extreme and specialize std::hash for all containers, but I'd probably be more careful with that and make an explicit hash object for containers:

template <typename C> struct ContainerHasher
{
  typedef typename C::value_type value_type;
  inline size_t operator()(const C & c) const
  {
    size_t seed = 0;
    for (typename C::const_iterator it = c.begin(), end = c.end(); it != end; ++it)
    {
      hash_combine<value_type>(seed, *it);
    }
    return seed;
  }
};

Usage:

std::unordered_map<std::set<int>, std::string, ContainerHasher<std::set<int>>> x;
like image 155
Kerrek SB Avatar answered Oct 17 '22 01:10

Kerrek SB