Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it safe to traverse a container during std::remove_if execution?

Suppose I want to remove the unique elements from an std::vector (not get rid of the duplicates, but retain only the elements that occur at least 2 times) and I want to achieve that in a pretty inefficient way - by calling std::count while std::remove_ifing. Consider the following code:

#include <algorithm>
#include <iostream>
#include <vector>

int main() {
    std::vector<int> vec = {1, 2, 6, 3, 6, 2, 7, 4, 4, 5, 6};

    auto to_remove = std::remove_if(vec.begin(), vec.end(), [&vec](int n) {
        return std::count(vec.begin(), vec.end(), n) == 1;
    });

    vec.erase(to_remove, vec.end());

    for (int i : vec) std::cout << i << ' ';
}

From reference on std::remove_if we know that the elements beginning from to_remove have unspecified values, but I wonder how unspecified they can really be.

To explain my concern a little further - we can see that the elements that should be removed are 1, 3, 5 and 7 - the only unique values. std::remove_if will move the 1 to the end but there is no guarantee that there will be a value 1 at the end after said operation. Can this be (due to that value being unspecified) that it will turn into 3 and make the std::count call return a count of (for example) 2 for the later encountered value 3?

Essentially my question is - is this guaranteed to work, and by work I mean to inefficiently erase unique elements from an std::vector?

I am interested in both language-lawyer answer (which could be "the standard says that this situation is possible, you should avoid it") and in-practice answer (which could be "the standard says that this situation is possible, but realistically there is no way of this value ending up as a completely differeny one, for example 3").

like image 226
Fureeish Avatar asked Jul 16 '19 10:07

Fureeish


People also ask

What does Remove_if return C++?

The remove_if() function returns an iterator that points to past the last element that is not removed.

What STD remove returns?

std :: remove It removes value from range. Transforms the range [first,last) into a range with all the elements that compare equal to val removed, and returns an iterator to the new end of that range.


2 Answers

After the predicate returns true the first time, there will be one unspecified value in the range. That means any subsequent calls of the predicate will count an unspecified value. The count is therefore potentially incorrect, and you may either leave values unaffected that you intend to be discarded, or discard values that should be retained.

You could modify the predicate so it keeps a count of how many times it has returned true, and reduce the range accordingly. For example;

std::size_t count = 0;
auto to_remove = std::remove_if(vec.begin(), vec.end(), [&vec, &count](int n)
{
    bool once = (std::count(vec.begin(), vec.end() - count, n) == 1);
    if (once) ++count;
    return once;
 });

Subtracting an integral value from a vector's end iterator is safe, but that isn't necessarily true for other containers.

like image 120
Peter Avatar answered Sep 20 '22 03:09

Peter


You misunderstood how std::remove_if works. The to-be-removed values are not necessarily shifted to the end. See:

Removing is done by shifting (by means of move assignment) the elements in the range in such a way that the elements that are not to be removed appear in the beginning of the range. cppreference

This is the only guarantee for the state of the range. According to my knowledge, it's not forbidden to shift all values around and it would still satisfy the complexity. So it might be possible that some compilers shift the unwanted values to the end but that would be just extra unnecessary work.

An example of possible implementation of removing odd numbers from 1 2 3 4 8 5:

   v               - read position
   1 2 3 4 8 5     - X will denotes shifted from value = unspecified
   ^               - write position
     v          
   1 2 3 4 8 5     1 is odd, ++read
   ^
       v
   2 X 3 4 8 5     2 is even, *write=move(*read), ++both
     ^   
         v
   2 X 3 4 8 5     3 is odd, ++read
     ^
           v
   2 4 3 X 8 5     4 is even, *write=move(*read), ++both
       ^
             v
   2 4 8 X X 5     8 is even, *write=move(*read), ++both
         ^

   2 4 8 X X 5     5 is odd, ++read
         ^         - this points to the new end.

So, in general, you cannot rely on count returning any meaningful values. Since in the case that move==copy (as is for ints) the resulting array is 2 4 8|4 8 5. Which has incorrect count both for the odd and even numbers. In case of std::unique_ptr the X==nullptr and thus the count for nullptr and removed values might be wrong. Other remaining values should not be left in the end part of the array as there were no copies done.

Note that the values are not unspecified as in you cannot know them. They are exactly the results of move assignments which might leave the value in unspecified state. If it specified the state of the moved-from variables ( asstd::unique_ptr does) then they would be known. E.g. if move==swap then the range will be permuted only.

like image 20
Quimby Avatar answered Sep 22 '22 03:09

Quimby