Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complexity of std::unordered_set iterator traversal

I recently played around with a std::unordered_set. I'm suspecting my version of the STL keeps track of non-empty buckets in some FILO data-structure (looks like a list). I suppose this is done in order to provide O(n) time traversal of the complete std::unordered_set (where n denotes the number of elements in a unordered_set with m buckets and m much larger than n). This improves a naive traversal of all buckets in O(m) time.

I've tested that indeed traversal of large and very sparse unordered_sets (with begin - end) is much faster than a naive traversal of all buckets.

Question: Is this traversal runtime guaranteed by the standard? Or is this just a feature of my particular standard library?


Here is my test code to play around with:

#include <iostream>
#include <vector>
#include <numeric>
#include <unordered_set>
using namespace std;

void test(vector<int> data, int alloc_size) {
   unordered_set<int> set(alloc_size);
   for (auto i: data) {
      set.insert(i);
   }

   for (size_t bidx = 0; bidx < set.bucket_count(); ++bidx) {
      cout << "[B" << bidx << ":";
      for (auto bit = set.begin(bidx); bit != set.end(bidx); ++bit) {
         cout << " " << *bit;
      }
      cout << "] ";
   }

   cout << "  {";
   for (auto const & d: set) {
      cout << d << " ";
   }
   cout << "}" << endl;
}

int main() {
   test({1, 2, 0}, 3);
   test({1, 2, 0, 7}, 3);
   test({18, 6, 11, 3, 13, 4}, 20);
   test({18, 6, 11, 3, 13, 4, 34}, 20);
}

Which prints:

[B0: 0] [B1: 1] [B2: 2] [B3:] [B4:]   {0 2 1 }
[B0: 0] [B1: 1] [B2: 7 2] [B3:] [B4:]   {0 7 2 1 }
[B0:] [B1:] [B2:] [B3: 3] [B4: 4] [B5:] [B6: 6] [B7:] [B8:] [B9:] [B10:] [B11: 11] [B12:] [B13: 13] [B14:] [B15:] [B16:] [B17:] [B18: 18] [B19:] [B20:] [B21:] [B22:]   {4 13 3 11 6 18 }
[B0:] [B1:] [B2:] [B3: 3] [B4: 4] [B5:] [B6: 6] [B7:] [B8:] [B9:] [B10:] [B11: 34 11] [B12:] [B13: 13] [B14:] [B15:] [B16:] [B17:] [B18: 18] [B19:] [B20:] [B21:] [B22:]   {4 13 3 34 11 6 18 }

It appears the begin - end traversal reports buckets in the reverse order in which they became non-empty (cf. first and third line). Inserting into an already non-empty bucket does not change this ordering (cf. second and fourth line).

like image 429
m8mble Avatar asked Apr 13 '17 08:04

m8mble


Video Answer


1 Answers

In short : yes, this is guaranteed by the standard.

Explanation

All iterators are required to have an O(n) traversal time complexity (where n is the amount of items traversed). This is because every single operation on an iterator has a constant time complexity (O(1)), including advancing the iterator one position.

From the standard (section 24.2.1 §8) :

All the categories of iterators require only those functions that are realizable for a given category in constant time (amortized). Therefore, requirement tables for the iterators do not have a complexity column.

So, when iterating over the items of a std::unordered_set, the time complexity is O(n) (with n the amount of items in the set).

Not convinced ?

A literal reading of the above quote only guarantees that constant time operations are realizable. This doesn't prevent a specific implementation from having worse time complexity than what's realizable. This is probably down to a bad choice of words, and hopefully no serious implementations actually do this.

The only other place in the standard that can help resolve this ambiguity, is in section 24.4.4 §1, where the standard has this to say about std::advance and std::distance :

These function templates use + and - for random access iterators (and are, therefore, constant time for them); for input, forward and bidirectional iterators they use ++ to provide linear time implementations.

So, the ++ operation on a forward iterator (as used for std::unordered_set) is implied to be a constant time operation.

In summary, while the wording of the first quote is ambiguous, the second quote confirms the intent.

like image 96
Sander De Dycker Avatar answered Sep 20 '22 13:09

Sander De Dycker