Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Safety of std::unordered_map::merge()

While writing some code targeting C++17, I kind of hit a stumbling block determining the exception-safety of an operation merging two compatible std::unordered_maps. Per the current working draft, §26.2.7, table 91 reads, in part, regarding the conditions of a.merge( a2 ):

Requires: a.get_allocator() == a2.get_allocator().

Attempts to extract each element in a2 and insert it into a using the hash function and key equality predicate of a. In containers with unique keys, if there is an element in a with key equivalent to the key of an element from a2, then that element is not extracted from a2.

Postconditions: Pointers and references to the transferred elements of a2 refer to those same elements but as members of a. Iterators referring to the transferred elements and all iterators referring to a will be invalidated, but iterators to elements remaining in a2 will remain valid.

Throws: Nothing unless the hash function or key equality predicate throws.

It's worth noting that these conditions are strongly reminiscent of those given for the requirements of ordinary associative containers (std::map), given in §26.2.6, table 90, a.merge( a2 ):

Requires: a.get_allocator() == a2.get_allocator().

Attempts to extract each element in a2 and insert it into a using the comparison object of a. In containers with unique keys, if there is an element in a with key equivalent to the key of an element from a2, then that element is not extracted from a2.

Postconditions: Pointers and references to the transferred elements of a2 refer to those same elements but as members of a. Iterators referring to the transferred elements will continue to refer to their elements, but they now behave as iterators into a, not into a2.

Throws: Nothing unless the comparison object throws.

I needed to merge two std::unordered_maps with the same number of elements which I could ensure would be unique across both containers, meaning that the map containing the merged result would have double the number of elements it previously had, and the container merged-from would be empty. This should be perfectly safe thanks to C++17, right?

This is a huge win in terms of performance… except, I had this nagging doubt. Hilariously, the postcondition statement says nothing about whether or not the previous max load factor in the merged map would be honored, and while that seems like a safe implicit assumption, it seemed to naïvely conflict with the statement about the unordered_map's exception-safety. If you're using a hash table design wherein the buckets are contiguously allocated buffers, maintaining your load factor would seem to imply rehashing, which would seem to imply reallocation of the bucket buffer.

Already, this seemed like an exercise in extreme navel-gazing and there was good cause to leave well enough alone: a more complicated hash table could conceivably be made as a completely node-based structure, similar to the red-black trees usually underpinning std::map, and in such a case, the spec seemed reasonable as a rehash wouldn't imply allocation.

Perhaps to my benefit, I succumbed to doubt and dug into gcc-7.1's implementation of merge. It's incredibly complicated, but to summarize my findings, I found that the buckets are, indeed, contiguously allocated buffers, and rehashing does imply reallocation. Maybe, I thought, there was some deeper magic I was missing (I stared at the source for nearly an entire day and still felt I had a poor understanding of it) which just disabled rehashing during a merge, meaning that all of the specified conditions would be upheld, but you could get a nasty performance regression after a suitably large merge, since your map would likely be overloaded.

I moved for a practical assessment mirroring my use-case (which I would have presented if it were possible, I'm sorry), instead of just questioning my interpretation of libstdc++:

#include <memory>        // for std::shared_ptr<> #include <new>           // for std::bad_alloc #include <utility>       // for std::move(), std::pair<> #include <type_traits>   // for std::true_type #include <unordered_map> // for std::unordered_map<> #include <functional>    // for std::hash<>, std::equal_to<> #include <string>        // for std::string #include <iostream>      // for std::cout #include <cstddef>       // for std::size_t  template<typename T> class PrimedFailureAlloc { public:   using value_type = T;   using propagate_on_container_copy_assignment = std::true_type;   using propagate_on_container_move_assignment = std::true_type;   using propagate_on_container_swap = std::true_type;    PrimedFailureAlloc() = default;    template<typename U>   PrimedFailureAlloc( const PrimedFailureAlloc<U>& source ) noexcept     : m_triggered{ source.m_triggered }   { }    template<typename U>   PrimedFailureAlloc( PrimedFailureAlloc<U>&& source ) noexcept     : m_triggered{ std::move( source.m_triggered ) }   { }    T* allocate( std::size_t n )   {     if ( *m_triggered ) throw std::bad_alloc{};     return static_cast<T*>( ::operator new( sizeof( T ) * n ) );   }    void deallocate( T* ptr, std::size_t n ) noexcept   {     ::operator delete( ptr );   }    bool operator==( const PrimedFailureAlloc& rhs ) noexcept   {     return m_triggered == rhs.m_triggered;   }    void trigger() noexcept { *m_triggered = true; }  private:   template<typename U>   friend class PrimedFailureAlloc;    std::shared_ptr<bool> m_triggered{ new bool{ false } }; };  template<typename T> bool operator!=( const PrimedFailureAlloc<T>& lhs,                  const PrimedFailureAlloc<T>& rhs ) noexcept {   return !(lhs == rhs); }  template< typename Key         , typename T         , typename Hash = std::hash<Key>         , typename KeyEqual = std::equal_to<Key>         > using FailingMap = std::unordered_map<   Key,   T,   Hash,   KeyEqual,   PrimedFailureAlloc<std::pair<const Key, T>> >;  template<typename Key, typename T> void printMap( const FailingMap<Key, T>& map ) {   std::cout << "{\n";   for ( const auto& [str, index] : map )     std::cout << "  { " << str << ", " << index << " }\n";   std::cout << "}\n"; }  int main() {   PrimedFailureAlloc<std::pair<const std::string, int>> a;   FailingMap<std::string, int> m1{ a };   FailingMap<std::string, int> m2{ a };    m1.insert( { { "Purple", 0 }, { "Green", 3 }, { "Indigo", 16 } } );   m2.insert( { { "Blue", 12 }, { "Red", 2 }, { "Violet", 5 } } );    // m1.reserve( m1.size() + m2.size() );   a.trigger();   m1.merge( m2 );    std::cout << "map :=\n";   printMap( m1 );    return 0; } 

Sure enough, after compiling this code under GCC-7.1, I get:

terminate called after throwing an instance of 'std::bad_alloc'   what():  std::bad_alloc [1]    10944 abort      ./a.out 

Whereas, uncommenting line 95 (m1.reserve( m1.size() + m2.size() );), results in the expected output:

map := {   { Red, 2 }   { Violet, 5 }   { Purple, 0 }   { Green, 3 }   { Blue, 12 }   { Indigo, 16 } } 

Understanding that C++17 is still a draft standard which hasn't yet been finalized, and that gcc's implementation is experimental, I suppose my question would be:

  1. Have I gravely misconstrued the safety of the merge operation as set out in the standard? Are there best-practices to using std::unordered_map::merge() that I've missed? Was I supposed to be implicitly responsible for ensuring the allocation of buckets? Will using std::unordered_map::reserve() actually be portable when clang, MSVC, and Intel finally support C++17? I mean, my program aborting when an exception-free merge was impossible does, after some notion, adhere to the listed postconditions…
  2. Is this actually a defect in the standard? The similarity of the wording between unordered associative containers and ordinary associative containers might have allowed unintended guarantees to slip in if the text were, say, copy-pasted.
  3. Is this just a gcc bug?

It's worth noting that I did check gcc's bug tracker prior to this write-up and seemed to find no open bugs matching my description, and furthermore, I checked the C++ standard defect report and similarly seemed to have come up empty (admittedly, doing a text-search of that site is aggravating and I may have been less than thorough). The latter is unsurprising, since standard defects and their workarounds or consequences are usually noted in gcc's source code and I found no such notations during my examination. I tried compiling my example code in my most recent checkout of clang (over a week old), but the compiler segfaulted so I took my examination there no further, and have not consulted libc++.

like image 451
Capital C Avatar asked Jun 13 '17 19:06

Capital C


People also ask

Which is more efficient map or unordered_map?

Insertion of spread keys in std::map tends to outperform std::unordered_map when map size is under 10000 elements. Insertion of dense keys in std::map doesn't present performance difference with std::unordered_map under 1000 elements. In all other situations std::unordered_map tends to perform faster.

Does unordered map allow duplicate keys?

Because unordered_map containers do not allow for duplicate keys, this means that the function actually returns 1 if an element with that key exists in the container, and zero otherwise.

Does unordered_map maintain insertion order?

No. It's called "unordered" for a reason. If you need to maintain an order of insertion, you've chosen an unsuitable data structure.

Is unordered map sorted?

Unordered map is an associative container that contains key-value pairs with unique keys. Search, insertion, and removal of elements have average constant-time complexity. Internally, the elements are not sorted in any particular order, but organized into buckets.


1 Answers

This is just a defect in the standard, i.e., your possibility 2.

LWG has just moved LWG issue 2977 to "Ready" status, which will strike the erroneous Throws clause.

like image 118
T.C. Avatar answered Oct 09 '22 00:10

T.C.