Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is remove_if and then erase efficient on vector?

Although there are tens of questions about remove_if + erase for vector. I couldn't find what is the performance of such action. When I write:

    myVector.erase(remove_if(myVector.begin(),
                         myVector.end(),
                         some_predicate), myVector.end());

The remove if will return an iterator to the last relevant item + 1 (let's call it X). I believe that this will happen in O(n).

But how the erase will work ?

  • If the erase will try to delete from the X to myVector.end() it will be O(n^2) because it will cause copying the vector to a new location, and there will be O(n) new allocations from the heap.
  • But if it will delete from myVector.end() to X it can do it in O(n) and more importantly will not allocate new memory (something I'm trying to avoid).

Thanks.

like image 564
OopsUser Avatar asked Feb 08 '17 13:02

OopsUser


2 Answers

Consider this vector:

|0|1|2|3|4|5|6|7|8|9|

We use remove_if to remove all elements that are multiples of 4:

std::remove_if(v.begin(), v.end(), [](auto i){ return i != 0 && !(i%4); });

This starts iterating through the vector with an iterator X until it finds an element where the predicate returns true:

|0|1|2|3|4|5|6|7|8|9|
         X

This is the first element we want to remove.

Next it creates another iterator pointing to the next element, Y = X+1, and checks the predicate for *Y:

|0|1|2|3|4|5|6|7|8|9|
         X Y

The predicate is false, so we want to keep that element, so it assigns the next element to the element we want to remove, by doing *X = std::move(*Y):

|0|1|2|3|5|5|6|7|8|9|
         X Y            *X = std::move(*Y)

So we have two iterators, X and Y, where X points to the next element in the "output" (i.e. the elements we're not removing) and Y is the next element to consider removing.

We move both iterators to the next position, check the predicate for Y (which is false again), and do another assignment:

|0|1|2|3|5|6|6|7|8|9|
           X Y          *X = std::move(*Y)

Then it does the same again at the next position:

|0|1|2|3|5|6|7|7|8|9|
             X Y       *X = std::move(*Y)

And then it moves on, but finds that the predicate is true for Y

|0|1|2|3|5|6|7|7|8|9|
               X Y

So it just increments Y, which skips that element and so doesn't copy it into the "output" position at X:

|0|1|2|3|5|6|7|7|8|9|
               X   Y 

The predicate is not true for Y, so it assigns it to X:

|0|1|2|3|5|6|7|9|8|9|
               X   Y     *X = std::move(*Y)

Then it increments X and Y again

|0|1|2|3|5|6|7|9|8|9|
                 X   Y

Now Y is past-the-end so we return X (which points past-the-end of the output sequence, i.e. the elements we want to keep).

After the remove_if returns X we call v.erase(X, v.end()), so the vector invokes the destructors for each element from X to the end:

|0|1|2|3|5|6|7|9|~|~|
                 X   end

And then sets the size so the vector ends at X:

|0|1|2|3|5|6|7|9|
                 end

After this v.capacity() >= v.size()+2 because the memory that was used by the two final elements is still present, but is not in use.

like image 57
Jonathan Wakely Avatar answered Oct 10 '22 05:10

Jonathan Wakely


The complexity of vector::erase is well-defined:

Linear: the number of calls to the destructor of T is the same as the number of elements erased, the assignment operator of T is called the number of times equal to the number of elements in the vector after the erased elements

The way it is going to work internally (i.e. how exactly is it going to remove your elements) is kind of irrelevant.

The complexity of remove_if is also defined and is

Exactly std::distance(first, last) applications of the predicate.

So your code has linear complexity.

like image 39
SingerOfTheFall Avatar answered Oct 10 '22 06:10

SingerOfTheFall