I'm wondering why both C++11 and Boost's hashmap does not resize while erasing elements through iteration. Even if that is not technically a memory leak I think it could be a serious issue in applications (it was a hidden issue for me, had hard time to track it back) and it could actually affecting many applications. Is this a "design flaw" with the container?
I benchmarked it and seems to be affecting several compiler releases (including VS, Clang, GCC)
The code to reproduce the issue is:
std::unordered_map<T1,T2> m;
for (int i = 0; i < 5000000; i++)
m.insert(std::make_pair(i, new data_type));
for (map_type::iterator it = m.begin(); it != m.end();) {
delete it->second;
it = m.erase(it);
}
I created a self-contained test file that use a custom allocator to track memory usage.
As long as I understand, the reason behind that is allowing erasing elements through iteration and keep valid iterators to not erased elements.. That seems a little weird requirement since inserting elements could cause a re-hash that invalidate iterators anyway.
But you could destroy the map directly..
Wich is how I fixed that (I wrapped the map inside a smart pointer and when it is empty I simply recreate a new empty map, resulted to be faster than rehashing, don't know why.).
In general any application that use unordered_map
as container for caching elements could suffer from that issue (you may want to remove elements from cache but usually no one do a "cache reset")
void rehash( size_type s ); If s is greater than the current buckets into the container then rehashed is done. The new bucket count can be greater than or equal to n.
The unordered_set::rehash() is a built-in function in C++ STL which is used to set the number of buckets in the container of unordered_set to given size or more. If size is greater than the current size of the container, then rehash is called.
Internally unordered_map is implemented using Hash Table, the key provided to map are hashed into indices of a hash table that is why the performance of data structure depends on hash function a lot but on an average, the cost of search, insert and delete from the hash table is O(1).
The unordered_map::end() is a built-in function in C++ STL which returns an iterator pointing to the position past the last element in the container in the unordered_map container.
unordered_map rehash in C++ STL. The std::unordered_map::rehash() is a built in function C++ STL which which sets the number of buckets in container to n or more. Syntax. void rehash( size_type s ); If s is greater than the current buckets into the container then rehashed is done. The new bucket count can be greater than or equal to n.
Every unordered_map implementation stores a linked list to external nodes in the array of buckets. Meaning that inserting an item will always allocate at least once (the new node) if not twice (resizing the array of buckets, then the new node). No, that is not at all the most efficient way to implement a hash map for most common uses.
std::unordered_map contains a load factor that it uses to manage the size of it's internal buckets. std::unordered_map uses this odd factor to keep the size of the container somewhere in between a 0.0 and 1.0 factor. This decreases the likelihood of a collision in a bucket.
unordered_map is a better data structure if you are storing expensive-to-copy items as your key or value. Which makes sense, given that its general design was lifted from Boost's pre-move-semantics design. Chandler Carruth (Google) mentions this problem in his CppCon '14 talk "Efficiency with Algorithms, Performance with Data Structures".
As far as I can tell, that behavior is not so much a result of the requirement to not invalidate iterators (std::unordered_map::rehash
also doesn't invalidate them) than a result of the complexity requirement for std::unordered_map::erase
, which should take constant time on average.
I can't tell you, why it was specified like this, but I can tell you, why it is the right default behavior for me:
max_load_factor
).std::unordere_map
could.Again, those points are true for my typical use cases, I don't claim that they are universally true for other people's software or that they were the motivation behind the specification of unordered_map
Interestingly, VS2015 and libstc++ seem to implement rehash(0)
differently *:
Apparently, the only portable way to minimize the memory footprint is to copy and swap.
Concerning the documentation, I agree that this should probably be mentioned explicitly somewhere, but on the other hand it is e.g. consistent with the documentation of std::vector::erase()
. Also I'm not 100% sure, if it is really impossible to write an implementation that rehashes on erase at least sometimes, without violating the requirements.
*) I inferred this from the results of bucket_count
and getAllocatedBytes()
from your allocator, not by actually looking at the source code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With