Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simultaneously iterating over and modifying an unordered_set?

Tags:

c++

c++11

Consider the following code:

unordered_set<T> S = ...;  for (const auto& x : S)    if (...)        S.insert(...); 

This is broken correct? If we insert something into S then the iterators may be invalidated (due to a rehash), which will break the range-for because under the hood it is using S.begin ... S.end.

Is there some pattern to deal with this?

One way is:

unordered_set<T> S = ...;  vector<T> S2;  for (const auto& x : S)    if (...)        S2.emplace_back(...);  for (auto& x : S2)     S.insert(move(x)); 

This seems clunky. Is there a better way I'm missing?

(Specifically if I was using a hand-rolled hash table and I could block it from rehashing until the end of the loop, it would be safe to use the first version.)

Update:

From http://en.cppreference.com/w/cpp/container/unordered_map/insert

If rehashing occurs due to the insertion, all iterators are invalidated. Otherwise iterators are not affected. References are not invalidated. Rehashing occurs only if the new number of elements is higher than max_load_factor() * bucket_count().

Could you mess with max_load_factor somehow to prevent rehashing?

like image 288
Andrew Tomazos Avatar asked Dec 20 '12 22:12

Andrew Tomazos


People also ask

What is the difference between a set and an unordered_set?

Set is an ordered sequence of unique keys whereas unordered_set is a set in which key can be stored in any order, so unordered. Set is implemented as a balanced tree structure that is why it is possible to maintain order between the elements (by specific tree traversal).

Can unordered_set have duplicate values?

Keys are immutable, therefore, the elements in an unordered_set cannot be modified once in the container - However, they can be inserted and removed. Unordered sets do not allow duplicates and are initialized using comma-delimited values enclosed in curly braces.

What does unordered_set mean?

Unordered sets are containers that store unique elements in no particular order, and which allow for fast retrieval of individual elements based on their value. In an unordered_set, the value of an element is at the same time its key, that identifies it uniquely.

What is the time complexity of the following unordered_set?

Average case: constant.


1 Answers

Could you mess with max_load_factor somehow to prevent rehashing?

Yes, you can set the max_load_factor() to infinity to ensure no rehashing occurs:

#include <iostream> #include <limits> #include <unordered_set>  int main() {     // initialize     std::unordered_set<int> S;      for (int i = 0; i < 8; ++i)         S.insert(i);      std::cout << "buckets: " << S.bucket_count() << std::endl;      // infinite max load factor => never need to rehash     const auto oldLoadFactor = S.max_load_factor();     S.max_load_factor(std::numeric_limits<float>::infinity());      for (const auto& x : S)     {         if (x > 2)             S.insert(x * 2);     }      // restore load factor, verify same bucket count     S.max_load_factor(oldLoadFactor);     std::cout << "buckets: " << S.bucket_count() << std::endl;      // now force rehash     S.rehash(0);     std::cout << "buckets: " << S.bucket_count() << std::endl; } 

Note that simply setting a new load factor does no rehashing, so those are cheap operations.

The rehash(0) bit works because it's a request that: 1) I get at least n buckets, and 2) I have enough buckets to satisfy my max_load_factor(). We just use zero to indicate we don't care for a minimum amount, we just want to rehash to satisfy our "new" factor, as if it was never changed to infinity.

Of course, this isn't exception-safe; if anything throws between the calls to max_load_factor(), our old factor is lost forever. Easily fixed with your favorite scope-guard utility or a utility class.

Note that you get no guarantees if you'll iterate over the new elements. You will iterate over the existing elements, but you may or may not iterate over the new elements. If that is okay (which per our chat it should be), then this will work.

For example, consider you iterate over an unordered set of integer and for each even integer x, insert x * 2. If those always get inserted just after your currrent position (by chance of implementation-detail and container state), you will never terminate the loop except through exceptions.

If you do need some guarantees, you need to with an alternate storage solution.

like image 126
GManNickG Avatar answered Oct 07 '22 05:10

GManNickG