Consider the following code:
unordered_set<T> S = ...; for (const auto& x : S) if (...) S.insert(...);
This is broken correct? If we insert something into S then the iterators may be invalidated (due to a rehash), which will break the range-for because under the hood it is using S.begin ... S.end.
Is there some pattern to deal with this?
One way is:
unordered_set<T> S = ...; vector<T> S2; for (const auto& x : S) if (...) S2.emplace_back(...); for (auto& x : S2) S.insert(move(x));
This seems clunky. Is there a better way I'm missing?
(Specifically if I was using a hand-rolled hash table and I could block it from rehashing until the end of the loop, it would be safe to use the first version.)
Update:
From http://en.cppreference.com/w/cpp/container/unordered_map/insert
If rehashing occurs due to the insertion, all iterators are invalidated. Otherwise iterators are not affected. References are not invalidated. Rehashing occurs only if the new number of elements is higher than
max_load_factor() * bucket_count()
.
Could you mess with max_load_factor
somehow to prevent rehashing?
Set is an ordered sequence of unique keys whereas unordered_set is a set in which key can be stored in any order, so unordered. Set is implemented as a balanced tree structure that is why it is possible to maintain order between the elements (by specific tree traversal).
Keys are immutable, therefore, the elements in an unordered_set cannot be modified once in the container - However, they can be inserted and removed. Unordered sets do not allow duplicates and are initialized using comma-delimited values enclosed in curly braces.
Unordered sets are containers that store unique elements in no particular order, and which allow for fast retrieval of individual elements based on their value. In an unordered_set, the value of an element is at the same time its key, that identifies it uniquely.
Average case: constant.
Could you mess with max_load_factor somehow to prevent rehashing?
Yes, you can set the max_load_factor()
to infinity to ensure no rehashing occurs:
#include <iostream> #include <limits> #include <unordered_set> int main() { // initialize std::unordered_set<int> S; for (int i = 0; i < 8; ++i) S.insert(i); std::cout << "buckets: " << S.bucket_count() << std::endl; // infinite max load factor => never need to rehash const auto oldLoadFactor = S.max_load_factor(); S.max_load_factor(std::numeric_limits<float>::infinity()); for (const auto& x : S) { if (x > 2) S.insert(x * 2); } // restore load factor, verify same bucket count S.max_load_factor(oldLoadFactor); std::cout << "buckets: " << S.bucket_count() << std::endl; // now force rehash S.rehash(0); std::cout << "buckets: " << S.bucket_count() << std::endl; }
Note that simply setting a new load factor does no rehashing, so those are cheap operations.
The rehash(0)
bit works because it's a request that: 1) I get at least n buckets, and 2) I have enough buckets to satisfy my max_load_factor()
. We just use zero to indicate we don't care for a minimum amount, we just want to rehash to satisfy our "new" factor, as if it was never changed to infinity.
Of course, this isn't exception-safe; if anything throws between the calls to max_load_factor()
, our old factor is lost forever. Easily fixed with your favorite scope-guard utility or a utility class.
Note that you get no guarantees if you'll iterate over the new elements. You will iterate over the existing elements, but you may or may not iterate over the new elements. If that is okay (which per our chat it should be), then this will work.
For example, consider you iterate over an unordered set of integer and for each even integer x
, insert x * 2
. If those always get inserted just after your currrent position (by chance of implementation-detail and container state), you will never terminate the loop except through exceptions.
If you do need some guarantees, you need to with an alternate storage solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With