Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good hash function for pair of primitive types

Tags:

c++

hash

I'm trying to figure out a good hash function for a std::pair of two primitive types. This is the way I have it implemented right now:

template<typename T, typename U>
std::size_t operator()(const std::pair<T,U> &rhs) const
{
    return stdext::hash_value<T>(rhs.first) ^ stdext::hash_value<U>(rhs.second);
}

It appears to work even if I have two pairs such as (1, 2) and (2, 1) (numbers flipped). They generate the same hash value but the values are still inserted successfully into the hash map. Any thoughts?

like image 223
user1040229 Avatar asked May 09 '13 21:05

user1040229


3 Answers

Generally speaking, hashing containers always have to handle this case (hash collisions). There are a couple methods they can use like chaining and probing, any of which will potentially hurt performance.

Instead, I would suggest using boost::hash_combine to combine the hashes in such a way they swapping first and second doesn't generate the same hash.

like image 194
Mark B Avatar answered Nov 14 '22 21:11

Mark B


Here's an implementation of hash_combine based on the docs from the current version of boost: (http://www.boost.org/doc/libs/1_53_0/doc/html/hash/reference.html#boost.hash_combine)

template<typename T> void hash_combine(size_t & seed, T const& v) {
  seed ^= stdext::hash_value(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}

You'd use it like this:

template<typename T, typename U>
std::size_t operator()(const std::pair<T,U> &rhs) const   {
  size_t retval = stdext::hash_value<T>(rhs.first);
  hash_combine(retval, rhs.second);
  return retval;
}

I can't actually vouch for the logic behind this function, but it looks more solid than most of the options mentioned here.

like image 32
Michael Anderson Avatar answered Nov 14 '22 21:11

Michael Anderson


Assuming that stdext::hash_value provides a good distribution of hash values for each of first and second, what you've done is fine if you don't expect a disproportionately high incidence of mirror image pairs (e.g. (1,2) and (2,1)). If your data set is such that you do expect a lot of those, you might consider tweaking the hash function to force them to be different. For example, inverting the first hash value:

return ~stdext::hash_value<T>(rhs.first) ^ stdext::hash_value<T>(rhs.second);

I mention this only because you expressed a concern about mirror image pairs. If your input is close to random in that regard, then ^ should be fine. Remember, the goal is to minimize collisions, not avoid them entirely.

like image 34
Mike Woolf Avatar answered Nov 14 '22 23:11

Mike Woolf