Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complexity of std::unordered_multiset insert

Why is the worst case complexity of std::unordered_multiset insert is linear? I understand why is that the case for std::unordered_set (you have to check that inserted value is not in the set) but for multiset I don't get it. Am I missing something obvious?

like image 204
JamesJohnson Avatar asked Apr 07 '14 20:04

JamesJohnson


1 Answers

The worst case complexity for std::unordered_multiset::insert() is linear because:

  • Unordered associative containers that support non-unique keys are said to support equivalent keys. When iterating these containers, elements with equivalent keys are adjacent to one another in iteration, forming equivalent-key groups.
  • Iterator functions require constant amortized time.

For example, consider the case where 5, 13, and 13 are inserted into an unordered_multiset that has 4 buckets, and unordered_multiset::key_eq(5, 13) returns false. In this case, unordered_multiset::hash_function(5) returns different hash codes for both 5 and 13. Despite having different hash codes, these elements may still be inserted into the same bucket. If the hash function for an integer returns the integer itself, and the bucket index is the result of the hash code modulus the number of buckets, then:

  • Element 5 is hashed to 5, and with 4 buckets, it is placed in bucket 1.
  • Element 13 is hashed to 13, and with 4 buckets, it is placed into bucket 1 as well.

While unordered_set::insert() checks to prevent duplicates during insertion, unordered_multiset::insert() identifies where to insert the element for equivalent-key grouping. In the worst case, the bucket contains [5, 13] when inserting the final 13, and upon iterating over all elements, the bucket contains [5, 13, 13]. As iteration over all elements occurs, the complexity is linear in size().

It is worth noting that a rehash may occur during unordered_multiset::insert(), and unordered_multiset::rehash() is specified as having a complexity with an average case linear in size() and the worst case is quadratic. During a rehash, all elements in the original hash table are iterated over and inserted into a new hash table. As iteration has a complexity linear in size(), and as identified above, each insertion has a worse case linear in size(), the resulting worst case is O(size()*size()).

like image 155
Tanner Sansbury Avatar answered Sep 28 '22 01:09

Tanner Sansbury