What is the default hash function used in C++ std::unordered_map?

Tags:

I am using

unordered_map<string, int>

and

unordered_map<int, int>

What hash function is used in each case and what is chance of collision in each case? I will be inserting unique string and unique int as keys in each case respectively.

I am interested in knowing the algorithm of hash function in case of string and int keys and their collision stats.

650

asked Oct 16 '13 19:10

Medicine

2 Answers

The function object std::hash<> is used.

Standard specializations exist for all built-in types, and some other standard library types such as std::string and std::thread. See the link for the full list.

For other types to be used in a std::unordered_map, you will have to specialize std::hash<> or create your own function object.

The chance of collision is completely implementation-dependent, but considering the fact that integers are limited between a defined range, while strings are theoretically infinitely long, I'd say there is a much better chance for collision with strings.

As for the implementation in GCC, the specialization for builtin-types just returns the bit pattern. Here's how they are defined in bits/functional_hash.h:

  /// Partial specializations for pointer types.
  template<typename _Tp>
    struct hash<_Tp*> : public __hash_base<size_t, _Tp*>
    {
      size_t
      operator()(_Tp* __p) const noexcept
      { return reinterpret_cast<size_t>(__p); }
    };

  // Explicit specializations for integer types.
#define _Cxx_hashtable_define_trivial_hash(_Tp)     \
  template<>                        \
    struct hash<_Tp> : public __hash_base<size_t, _Tp>  \
    {                                                   \
      size_t                                            \
      operator()(_Tp __val) const noexcept              \
      { return static_cast<size_t>(__val); }            \
    };

  /// Explicit specialization for bool.
  _Cxx_hashtable_define_trivial_hash(bool)

  /// Explicit specialization for char.
  _Cxx_hashtable_define_trivial_hash(char)

  /// ...

The specialization for std::string is defined as:

#ifndef _GLIBCXX_COMPATIBILITY_CXX0X
  /// std::hash specialization for string.
  template<>
    struct hash<string>
    : public __hash_base<size_t, string>
    {
      size_t
      operator()(const string& __s) const noexcept
      { return std::_Hash_impl::hash(__s.data(), __s.length()); }
    };

Some further search leads us to:

struct _Hash_impl
{
  static size_t
  hash(const void* __ptr, size_t __clength,
       size_t __seed = static_cast<size_t>(0xc70f6907UL))
  { return _Hash_bytes(__ptr, __clength, __seed); }
  ...
};
...
// Hash function implementation for the nontrivial specialization.
// All of them are based on a primitive that hashes a pointer to a
// byte array. The actual hash algorithm is not guaranteed to stay
// the same from release to release -- it may be updated or tuned to
// improve hash quality or speed.
size_t
_Hash_bytes(const void* __ptr, size_t __len, size_t __seed);

_Hash_bytes is an external function from libstdc++. A bit more searching led me to this file, which states:

// This file defines Hash_bytes, a primitive used for defining hash
// functions. Based on public domain MurmurHashUnaligned2, by Austin
// Appleby.  http://murmurhash.googlepages.com/

So the default hashing algorithm GCC uses for strings is MurmurHashUnaligned2.

127

answered Oct 19 '22 01:10

Avidan Borisov

GCC C++11 uses "MurmurHashUnaligned2", by Austin Appleby

Though the hashing algorithms are compiler-dependent, I'll present it for GCC C++11. @Avidan Borisov astutely discovered that the GCC hashing algorithm used for strings is "MurmurHashUnaligned2," by Austin Appleby. I did some searching and found a mirrored copy of GCC on Github. Therefore:

The GCC C++11 hashing functions used for unordered_map (a hash table template) and unordered_set (a hash set template) appear to be as follows.

Thanks to Avidan Borisov for his background research which on the question of what are the GCC C++11 hash functions used, stating that GCC uses an implementation of "MurmurHashUnaligned2", by Austin Appleby (see http://murmurhash.googlepages.com/ and https://github.com/aappleby/smhasher).
In the file "gcc/libstdc++-v3/libsupc++/hash_bytes.cc", here (https://github.com/gcc-mirror/gcc/blob/master/libstdc++-v3/libsupc++/hash_bytes.cc), I found the implementations. Here's the one for the "32-bit size_t" return value, for example (pulled 11 Aug 2017)

Code:

// Implementation of Murmur hash for 32-bit size_t.
size_t _Hash_bytes(const void* ptr, size_t len, size_t seed)
{
  const size_t m = 0x5bd1e995;
  size_t hash = seed ^ len;
  const char* buf = static_cast<const char*>(ptr);

  // Mix 4 bytes at a time into the hash.
  while (len >= 4)
  {
    size_t k = unaligned_load(buf);
    k *= m;
    k ^= k >> 24;
    k *= m;
    hash *= m;
    hash ^= k;
    buf += 4;
    len -= 4;
  }

  // Handle the last few bytes of the input array.
  switch (len)
  {
    case 3:
      hash ^= static_cast<unsigned char>(buf[2]) << 16;
      [[gnu::fallthrough]];
    case 2:
      hash ^= static_cast<unsigned char>(buf[1]) << 8;
      [[gnu::fallthrough]];
    case 1:
      hash ^= static_cast<unsigned char>(buf[0]);
      hash *= m;
  };

  // Do a few final mixes of the hash.
  hash ^= hash >> 13;
  hash *= m;
  hash ^= hash >> 15;
  return hash;
}

The latest version of Austin Appleby's hashing functions is "MurmurHash3", which is released into the public domain!

Austin states in his readme:

The SMHasher suite also includes MurmurHash3, which is the latest version in the series of MurmurHash functions - the new version is faster, more robust, and its variants can produce 32- and 128-bit hash values efficiently on both x86 and x64 platforms.

For MurmurHash3's source code, see here:

MurmurHash3.h
MurmurHash3.cpp

And the great thing is!? It's public domain software. That's right! The tops of the files state:

// MurmurHash3 was written by Austin Appleby, and is placed in the public
// domain. The author hereby disclaims copyright to this source code.

So, if you'd like to use MurmurHash3 in your open source software, personal projects, or proprietary software, including for implementing your own hash tables in C, go for it!

If you'd like build instructions to build and test his MurmurHash3 code, I've written some here: https://github.com/ElectricRCAircraftGuy/smhasher/blob/add_build_instructions/build/README.md. Hopefully this PR I've opened gets accepted and then they will end up in his main repo. But, until then, refer to the build instructions in my fork.

For additional hashing functions, including `djb2`, and the 2 versions of the K&R hashing functions...

...(one apparently terrible, one pretty good), see my other answer here: hash function for string.

Related questions
                            
                                Could it be the case that sizeof(T*) != sizeof(const T*)?
                            
                                How to create a UTF-8 string literal in Visual C++ 2008
                            
                                How do boost::variant and boost::any work?
                            
                                Detect Windows or Linux in C, C++ [duplicate]
                            
                                C++11 lambda as member variable?
                            
                                How to use makefiles in Visual Studio?
                            
                                How can I make an unordered set of pairs of integers in C++?
                            
                                Missing C++ header <__debug> after updating OSX Command Line Tools 6.3
                            
                                Why do we actually need Private or Protected inheritance in C++?
                            
                                With "-fno-exceptions", what happens with "new T"?
                            
                                How much overhead is there in calling a function in C++?
                            
                                Global scope vs global namespace
                            
                                Understanding glm::lookAt()
                            
                                When does a constexpr function get evaluated at compile time?
                            
                                Passing integers as constant references versus copying
                            
                                Linker returns "relocation has an invalid symbol at symbol index..."
                            
                                Is there a compact equivalent to Python range() in C++/STL
                            
                                De Morgan's Law optimization with overloaded operators
                            
                                How can I distinguish between high- and low-performance cores/threads in C++?
                            
                                Are there C++ equivalents for the Protocol Buffers delimited I/O functions in Java?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the default hash function used in C++ std::unordered_map?

Tags:

c++

c++11

hash

stl

unordered-map

Medicine

People also ask

2 Answers

Avidan Borisov

GCC C++11 uses "MurmurHashUnaligned2", by Austin Appleby

The latest version of Austin Appleby's hashing functions is "MurmurHash3", which is released into the public domain!

For additional hashing functions, including `djb2`, and the 2 versions of the K&R hashing functions...

See also:

Gabriel Staples

Recent Activity

Donate For Us

What is the default hash function used in C++ std::unordered_map?

Tags:

c++

c++11

hash

stl

unordered-map

Medicine

People also ask

2 Answers

Avidan Borisov

GCC C++11 uses "MurmurHashUnaligned2", by Austin Appleby

The latest version of Austin Appleby's hashing functions is "MurmurHash3", which is released into the public domain!

For additional hashing functions, including djb2, and the 2 versions of the K&R hashing functions...

See also:

Gabriel Staples

Related questions

Recent Activity

Donate For Us

For additional hashing functions, including `djb2`, and the 2 versions of the K&R hashing functions...