Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::set erase complexity anomality?

I am trying to figure out the complexity of erasing multiple elements from std::set. I am using this page as source.

It claims that the complexity for erasing a single item using an iterator is amortized O(1), but erasing multiple items using the range form is log(c.size()) + std::distance(first, last) (i.e. - log of the set's size + the number of elements deleted).

Taken at face value, if the number of elements to be erased (n) is much smaller than the number of elements in the set (m), this means that looping over the elements to be erased and erasing them one at a time is quicker (O(n)) than erasing them with one call (O(log m) assuming n<<m).

Obviously, had that really been the case, the internal implementation of the second form would just do the above loop.

Is this an error at the site? A bug in the specs? Am I just missing something?

Thanks, Shachar

like image 207
Shachar Shemesh Avatar asked Mar 03 '14 09:03

Shachar Shemesh


People also ask

What is the time complexity of erase in set?

The single item erase has O complexity of log(c. size()), but amortized complexity of O(1). Performing multiple single erases in a loop will thus cost log(c. size()) + number of erases, which is exactly what the range form's complexity is.

How do you delete an element from a set in C++?

set::erase() erase() function is used to remove elements from a container from the specified position or range.

What is the complexity of different std :: set operations?

std::set is commonly implemented as a red-black binary search tree. Insertion on this data structure has a worst-case of O(log(n)) complexity, as the tree is kept balanced.

How do you delete an element from a set?

The remove() method removes the specified element from the set. This method is different from the discard() method, because the remove() method will raise an error if the specified item does not exist, and the discard() method will not.


2 Answers

Internally elements of a set are stored in a balanced binary tree. Balanced tree is a tree where maximal height difference between any left and right subtree of any node is 1.

Maintaining balanced structure is important to guarantee that a search of any element in the tree (in the set) takes in worst case O(log(n)) steps.

Removal of an element may destroy balance. To restore balance rotations must be performed. In some cases a single removal causes several rotations so that the operation takes O(log(n)) steps, but in average a removal takes O(1) steps.

So, when several elements scattered over the set must be deleted, one by one, the amortized costs with high probability will be O(1) per removal.

Removing several elements in the range (first, last, where one element follows the next) will almost certainly destroy the balance, what causes the log factor in the complexity: log(n) + std::distance(first, last)

like image 131
Andrushenko Alexander Avatar answered Sep 18 '22 05:09

Andrushenko Alexander


It seems the problem is hiding behind the (somewhat weasel) word "amortized". The single item erase has O complexity of log(c.size()), but amortized complexity of O(1).

Performing multiple single erases in a loop will thus cost log(c.size()) + number of erases, which is exactly what the range form's complexity is.

Shachar

like image 44
Shachar Shemesh Avatar answered Sep 21 '22 05:09

Shachar Shemesh