Why does C++11/Boost `unordered_map` not rehash when erasing?

Tags:

I'm wondering why both C++11 and Boost's hashmap does not resize while erasing elements through iteration. Even if that is not technically a memory leak I think it could be a serious issue in applications (it was a hidden issue for me, had hard time to track it back) and it could actually affecting many applications. Is this a "design flaw" with the container?

I benchmarked it and seems to be affecting several compiler releases (including VS, Clang, GCC)

The code to reproduce the issue is:

std::unordered_map<T1,T2> m;

for (int i = 0; i < 5000000; i++)
        m.insert(std::make_pair(i, new data_type));


for (map_type::iterator it = m.begin(); it != m.end();) {
        delete it->second;
        it = m.erase(it);
}

I created a self-contained test file that use a custom allocator to track memory usage.

As long as I understand, the reason behind that is allowing erasing elements through iteration and keep valid iterators to not erased elements.. That seems a little weird requirement since inserting elements could cause a re-hash that invalidate iterators anyway.

But you could destroy the map directly..

Wich is how I fixed that (I wrapped the map inside a smart pointer and when it is empty I simply recreate a new empty map, resulted to be faster than rehashing, don't know why.).

In general any application that use unordered_map as container for caching elements could suffer from that issue (you may want to remove elements from cache but usually no one do a "cache reset")

511

asked Aug 09 '15 19:08

CoffeDeveloper

1 Answers

As far as I can tell, that behavior is not so much a result of the requirement to not invalidate iterators (std::unordered_map::rehash also doesn't invalidate them) than a result of the complexity requirement for std::unordered_map::erase, which should take constant time on average.

I can't tell you, why it was specified like this, but I can tell you, why it is the right default behavior for me:

In many applications, the content of my hash table is virtually constant after initialization anyway - so here I don't care.
Where this is not the case, at least the average number of elements stays more or less the same (within an order of magnitude). So even if a lot of objects are deleted at some point in time, new elements will probably be added soon afterwards. In that case, it wouldn't really reduce the memory footprint and the overhead of rehashing two times (once after deletion and once after adding new elements) would usually outweigh any performance improvement I might get through a more compact table.
Erasing a larger number of elements (e.g. by a filter function) would be severely slowed down by intermediate rehashes, if you could not control the heuristic (as you can when inserting elements by modifying max_load_factor).
So finally, even in those cases where it is actually beneficial to rehash, I can usually make a better decision, about when to do it (e.g. via rehash or copy and swap) than a generic heuristic in std::unordere_map could.

Again, those points are true for my typical use cases, I don't claim that they are universally true for other people's software or that they were the motivation behind the specification of unordered_map

Interestingly, VS2015 and libstc++ seem to implement rehash(0) differently *:

libstc++ will actually shrink (reallocate) the memory where the table is stored
VS2015 will decrease the table size (a.k.a. bucket number) but not reallocate the table. So even after rehashing an empty hash map, the surplus memory for the table will not be returned.

Apparently, the only portable way to minimize the memory footprint is to copy and swap.

Concerning the documentation, I agree that this should probably be mentioned explicitly somewhere, but on the other hand it is e.g. consistent with the documentation of std::vector::erase(). Also I'm not 100% sure, if it is really impossible to write an implementation that rehashes on erase at least sometimes, without violating the requirements.

*) I inferred this from the results of bucket_count and getAllocatedBytes() from your allocator, not by actually looking at the source code.

answered Oct 24 '22 09:10

MikeMB

Related questions
                            
                                How does an unspecified pointer conversion behave in C++14?
                            
                                twisted logic: a global variable in one file refers to an extern variable but is also referred by that extern variable
                            
                                Does crashlytics have a public API?
                            
                                Is this a correct C++11 double-checked locking version with shared_ptr?
                            
                                Is any kind of keepalive necessary on localhost socket?
                            
                                Two strings as XYZ and XZY
                            
                                error in cmake opencv: Parse error in command line argument: -D
                            
                                How to use another class as a class template specialization
                            
                                C++ idioms when declaring template class members and constructors
                            
                                Combining a C# and C++ project works for x86 and x64 but not for ARM
                            
                                Googletest Eclipse C++ : How to have both test and production executable?
                            
                                How can I check that a file is a valid HDF5 file?
                            
                                QMessageBox is not displaying whole title
                            
                                Active member of an union, uniform initialization and constructors
                            
                                Is friend function getting inherited here?
                            
                                Thread Building Blocks: Deadlocks because all threads used up
                            
                                uniqueness of struct names
                            
                                Smoothing algorithm using integer arithmetic
                            
                                How do non-type template parameters get compiled?
                            
                                How does binding parameters work in SQLite3 (with minimal example)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does C++11/Boost `unordered_map` not rehash when erasing?

Tags:

c++

c++11

boost

unordered-map

CoffeDeveloper

People also ask

1 Answers

MikeMB

Recent Activity

Donate For Us