Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where is the performance gain of the erase-remove idiom coming from

I need to erase all elements from a vector which fulfill a certain criteria.

My first approach would be to loop through the vector and call vector::erase on all elements which fulfill the criteria.

As far as I understand, vector::erase has a bad performance for this use case, because it removes the item from the underlying array, and moves the rest of the vector forward by one element (or more if you erase a range of elements). When you remove multiple elements, the rear elements will be shifted at each removal.

The remove algorithm takes all the elements to be removed, and moves them to the end of the vector, so you only have to remove that rear part of the vector, which involves no shifting.

But why is this faster than erasing? (is it even faster?)

Doesn't moving an element to the end imply moving all the following elements forward like in vector::erase?

How comes, that remove only has a complexity of O(n)?

like image 229
Jounathaen Avatar asked Dec 10 '22 10:12

Jounathaen


1 Answers

The performance issue here is not about erasing the elements that are to be removed, or moving them to the end (which does not happen actually), but about moving the elements that are to be kept.

If you use erase on each element you want to remove, you need to move all the elements after these... for each call to erase. Typically, if you want to remove k elements, you will move the elements after the latest one (in the vector) k times instead of only one.

But if you call remove, you will only move these once (see example below).

A small example to better understand how these two methods work:

Let's say you have a vector of size 1000 and the elements you want to remove are at position 17 and 37.

With erase acting on the two elements to be removed:

  • when you call erase() for the 17th element, you need to move elements 18 to 999, 982 elements.
  • when you call erase() for the 36th element (it's the 36th one now!), you need to move elements 37 to 998, 962 elements.

In total, you have moved 962 + 982 = 1944 elements, 962 of these have been moved twice for nothing.

With remove, what happens is as follows:

element 0 does not change;
element 1 does not change;
...
element 17 is "discarded";
element 18 is moved at position 17;
element 19 is moved at position 18;
...
element 36 is moved at position 35;
element 37 is "discarded";
element 38 is moved at position 36;
...
element 999 is moved at position 997.

In total, you have moved 998 elements (1000 minus the two you removed), which is much better than the 1943 elements of the previous methods. This is even better if you have more than 2 elements to remove.

You can have a look at the possible implementation on en.cppreference.com to better understand how std::remove works.

like image 89
Holt Avatar answered Jan 18 '23 10:01

Holt