I am trying to figure out the complexity of erasing multiple elements from std::set. I am using this page as source.
It claims that the complexity for erasing a single item using an iterator is amortized O(1), but erasing multiple items using the range form is log(c.size()) + std::distance(first, last) (i.e. - log of the set's size + the number of elements deleted).
Taken at face value, if the number of elements to be erased (n) is much smaller than the number of elements in the set (m), this means that looping over the elements to be erased and erasing them one at a time is quicker (O(n)) than erasing them with one call (O(log m) assuming n<<m).
Obviously, had that really been the case, the internal implementation of the second form would just do the above loop.
Is this an error at the site? A bug in the specs? Am I just missing something?
Thanks, Shachar
The single item erase has O complexity of log(c. size()), but amortized complexity of O(1). Performing multiple single erases in a loop will thus cost log(c. size()) + number of erases, which is exactly what the range form's complexity is.
set::erase() erase() function is used to remove elements from a container from the specified position or range.
std::set is commonly implemented as a red-black binary search tree. Insertion on this data structure has a worst-case of O(log(n)) complexity, as the tree is kept balanced.
The remove() method removes the specified element from the set. This method is different from the discard() method, because the remove() method will raise an error if the specified item does not exist, and the discard() method will not.
Internally elements of a set are stored in a balanced binary tree. Balanced tree is a tree where maximal height difference between any left and right subtree of any node is 1.
Maintaining balanced structure is important to guarantee that a search of any element in the tree (in the set) takes in worst case O(log(n))
steps.
Removal of an element may destroy balance. To restore balance rotations must be performed. In some cases a single removal causes several rotations so that the operation takes O(log(n))
steps, but in average a removal takes O(1)
steps.
So, when several elements scattered over the set must be deleted, one by one, the amortized costs with high probability will be O(1)
per removal.
Removing several elements in the range (first, last
, where one element follows the next) will almost certainly destroy the balance, what causes the log factor in the complexity: log(n) + std::distance(first, last)
It seems the problem is hiding behind the (somewhat weasel) word "amortized". The single item erase has O complexity of log(c.size()), but amortized complexity of O(1).
Performing multiple single erases in a loop will thus cost log(c.size()) + number of erases, which is exactly what the range form's complexity is.
Shachar
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With