While writing some code targeting C++17, I kind of hit a stumbling block determining the exception-safety of an operation merging two compatible std::unordered_maps. Per the current working draft, §26.2.7, table 91 reads, in part, regarding the conditions of a.merge( a2 )
:
Requires:
a.get_allocator() == a2.get_allocator()
.Attempts to extract each element in
a2
and insert it intoa
using the hash function and key equality predicate ofa
. In containers with unique keys, if there is an element ina
with key equivalent to the key of an element froma2
, then that element is not extracted froma2
.Postconditions: Pointers and references to the transferred elements of
a2
refer to those same elements but as members ofa
. Iterators referring to the transferred elements and all iterators referring toa
will be invalidated, but iterators to elements remaining ina2
will remain valid.Throws: Nothing unless the hash function or key equality predicate throws.
It's worth noting that these conditions are strongly reminiscent of those given for the requirements of ordinary associative containers (std::map), given in §26.2.6, table 90, a.merge( a2 )
:
Requires:
a.get_allocator() == a2.get_allocator()
.Attempts to extract each element in
a2
and insert it intoa
using the comparison object ofa
. In containers with unique keys, if there is an element ina
with key equivalent to the key of an element froma2
, then that element is not extracted froma2
.Postconditions: Pointers and references to the transferred elements of
a2
refer to those same elements but as members ofa
. Iterators referring to the transferred elements will continue to refer to their elements, but they now behave as iterators intoa
, not intoa2
.Throws: Nothing unless the comparison object throws.
I needed to merge two std::unordered_maps with the same number of elements which I could ensure would be unique across both containers, meaning that the map containing the merged result would have double the number of elements it previously had, and the container merged-from would be empty. This should be perfectly safe thanks to C++17, right?
This is a huge win in terms of performance… except, I had this nagging doubt. Hilariously, the postcondition statement says nothing about whether or not the previous max load factor in the merged map would be honored, and while that seems like a safe implicit assumption, it seemed to naïvely conflict with the statement about the unordered_map's exception-safety. If you're using a hash table design wherein the buckets are contiguously allocated buffers, maintaining your load factor would seem to imply rehashing, which would seem to imply reallocation of the bucket buffer.
Already, this seemed like an exercise in extreme navel-gazing and there was good cause to leave well enough alone: a more complicated hash table could conceivably be made as a completely node-based structure, similar to the red-black trees usually underpinning std::map, and in such a case, the spec seemed reasonable as a rehash wouldn't imply allocation.
Perhaps to my benefit, I succumbed to doubt and dug into gcc-7.1's implementation of merge. It's incredibly complicated, but to summarize my findings, I found that the buckets are, indeed, contiguously allocated buffers, and rehashing does imply reallocation. Maybe, I thought, there was some deeper magic I was missing (I stared at the source for nearly an entire day and still felt I had a poor understanding of it) which just disabled rehashing during a merge, meaning that all of the specified conditions would be upheld, but you could get a nasty performance regression after a suitably large merge, since your map would likely be overloaded.
I moved for a practical assessment mirroring my use-case (which I would have presented if it were possible, I'm sorry), instead of just questioning my interpretation of libstdc++:
#include <memory> // for std::shared_ptr<> #include <new> // for std::bad_alloc #include <utility> // for std::move(), std::pair<> #include <type_traits> // for std::true_type #include <unordered_map> // for std::unordered_map<> #include <functional> // for std::hash<>, std::equal_to<> #include <string> // for std::string #include <iostream> // for std::cout #include <cstddef> // for std::size_t template<typename T> class PrimedFailureAlloc { public: using value_type = T; using propagate_on_container_copy_assignment = std::true_type; using propagate_on_container_move_assignment = std::true_type; using propagate_on_container_swap = std::true_type; PrimedFailureAlloc() = default; template<typename U> PrimedFailureAlloc( const PrimedFailureAlloc<U>& source ) noexcept : m_triggered{ source.m_triggered } { } template<typename U> PrimedFailureAlloc( PrimedFailureAlloc<U>&& source ) noexcept : m_triggered{ std::move( source.m_triggered ) } { } T* allocate( std::size_t n ) { if ( *m_triggered ) throw std::bad_alloc{}; return static_cast<T*>( ::operator new( sizeof( T ) * n ) ); } void deallocate( T* ptr, std::size_t n ) noexcept { ::operator delete( ptr ); } bool operator==( const PrimedFailureAlloc& rhs ) noexcept { return m_triggered == rhs.m_triggered; } void trigger() noexcept { *m_triggered = true; } private: template<typename U> friend class PrimedFailureAlloc; std::shared_ptr<bool> m_triggered{ new bool{ false } }; }; template<typename T> bool operator!=( const PrimedFailureAlloc<T>& lhs, const PrimedFailureAlloc<T>& rhs ) noexcept { return !(lhs == rhs); } template< typename Key , typename T , typename Hash = std::hash<Key> , typename KeyEqual = std::equal_to<Key> > using FailingMap = std::unordered_map< Key, T, Hash, KeyEqual, PrimedFailureAlloc<std::pair<const Key, T>> >; template<typename Key, typename T> void printMap( const FailingMap<Key, T>& map ) { std::cout << "{\n"; for ( const auto& [str, index] : map ) std::cout << " { " << str << ", " << index << " }\n"; std::cout << "}\n"; } int main() { PrimedFailureAlloc<std::pair<const std::string, int>> a; FailingMap<std::string, int> m1{ a }; FailingMap<std::string, int> m2{ a }; m1.insert( { { "Purple", 0 }, { "Green", 3 }, { "Indigo", 16 } } ); m2.insert( { { "Blue", 12 }, { "Red", 2 }, { "Violet", 5 } } ); // m1.reserve( m1.size() + m2.size() ); a.trigger(); m1.merge( m2 ); std::cout << "map :=\n"; printMap( m1 ); return 0; }
Sure enough, after compiling this code under GCC-7.1, I get:
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc [1] 10944 abort ./a.out
Whereas, uncommenting line 95 (m1.reserve( m1.size() + m2.size() );
), results in the expected output:
map := { { Red, 2 } { Violet, 5 } { Purple, 0 } { Green, 3 } { Blue, 12 } { Indigo, 16 } }
Understanding that C++17 is still a draft standard which hasn't yet been finalized, and that gcc's implementation is experimental, I suppose my question would be:
std::unordered_map::merge()
that I've missed? Was I supposed to be implicitly responsible for ensuring the allocation of buckets? Will using std::unordered_map::reserve()
actually be portable when clang, MSVC, and Intel finally support C++17? I mean, my program aborting when an exception-free merge was impossible does, after some notion, adhere to the listed postconditions…It's worth noting that I did check gcc's bug tracker prior to this write-up and seemed to find no open bugs matching my description, and furthermore, I checked the C++ standard defect report and similarly seemed to have come up empty (admittedly, doing a text-search of that site is aggravating and I may have been less than thorough). The latter is unsurprising, since standard defects and their workarounds or consequences are usually noted in gcc's source code and I found no such notations during my examination. I tried compiling my example code in my most recent checkout of clang (over a week old), but the compiler segfaulted so I took my examination there no further, and have not consulted libc++.
Insertion of spread keys in std::map tends to outperform std::unordered_map when map size is under 10000 elements. Insertion of dense keys in std::map doesn't present performance difference with std::unordered_map under 1000 elements. In all other situations std::unordered_map tends to perform faster.
Because unordered_map containers do not allow for duplicate keys, this means that the function actually returns 1 if an element with that key exists in the container, and zero otherwise.
No. It's called "unordered" for a reason. If you need to maintain an order of insertion, you've chosen an unsuitable data structure.
Unordered map is an associative container that contains key-value pairs with unique keys. Search, insertion, and removal of elements have average constant-time complexity. Internally, the elements are not sorted in any particular order, but organized into buckets.
This is just a defect in the standard, i.e., your possibility 2.
LWG has just moved LWG issue 2977 to "Ready" status, which will strike the erroneous Throws clause.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With