Suppose I want to remove the unique elements from an std::vector
(not get rid of the duplicates, but retain only the elements that occur at least 2 times) and I want to achieve that in a pretty inefficient way - by calling std::count
while std::remove_if
ing. Consider the following code:
#include <algorithm>
#include <iostream>
#include <vector>
int main() {
std::vector<int> vec = {1, 2, 6, 3, 6, 2, 7, 4, 4, 5, 6};
auto to_remove = std::remove_if(vec.begin(), vec.end(), [&vec](int n) {
return std::count(vec.begin(), vec.end(), n) == 1;
});
vec.erase(to_remove, vec.end());
for (int i : vec) std::cout << i << ' ';
}
From reference on std::remove_if
we know that the elements beginning from to_remove
have unspecified values, but I wonder how unspecified they can really be.
To explain my concern a little further - we can see that the elements that should be removed are 1
, 3
, 5
and 7
- the only unique values. std::remove_if
will move the 1
to the end but there is no guarantee that there will be a value 1
at the end after said operation. Can this be (due to that value being unspecified) that it will turn into 3
and make the std::count
call return a count of (for example) 2 for the later encountered value 3
?
Essentially my question is - is this guaranteed to work, and by work I mean to inefficiently erase unique elements from an std::vector
?
I am interested in both language-lawyer answer (which could be "the standard says that this situation is possible, you should avoid it") and in-practice answer (which could be "the standard says that this situation is possible, but realistically there is no way of this value ending up as a completely differeny one, for example 3
").
The remove_if() function returns an iterator that points to past the last element that is not removed.
std :: remove It removes value from range. Transforms the range [first,last) into a range with all the elements that compare equal to val removed, and returns an iterator to the new end of that range.
After the predicate returns true
the first time, there will be one unspecified value in the range. That means any subsequent calls of the predicate will count an unspecified value. The count is therefore potentially incorrect, and you may either leave values unaffected that you intend to be discarded, or discard values that should be retained.
You could modify the predicate so it keeps a count of how many times it has returned true, and reduce the range accordingly. For example;
std::size_t count = 0;
auto to_remove = std::remove_if(vec.begin(), vec.end(), [&vec, &count](int n)
{
bool once = (std::count(vec.begin(), vec.end() - count, n) == 1);
if (once) ++count;
return once;
});
Subtracting an integral value from a vector's end iterator is safe, but that isn't necessarily true for other containers.
You misunderstood how std::remove_if
works. The to-be-removed values are not necessarily shifted to the end. See:
Removing is done by shifting (by means of move assignment) the elements in the range in such a way that the elements that are not to be removed appear in the beginning of the range. cppreference
This is the only guarantee for the state of the range. According to my knowledge, it's not forbidden to shift all values around and it would still satisfy the complexity. So it might be possible that some compilers shift the unwanted values to the end but that would be just extra unnecessary work.
An example of possible implementation of removing odd numbers from 1 2 3 4 8 5
:
v - read position
1 2 3 4 8 5 - X will denotes shifted from value = unspecified
^ - write position
v
1 2 3 4 8 5 1 is odd, ++read
^
v
2 X 3 4 8 5 2 is even, *write=move(*read), ++both
^
v
2 X 3 4 8 5 3 is odd, ++read
^
v
2 4 3 X 8 5 4 is even, *write=move(*read), ++both
^
v
2 4 8 X X 5 8 is even, *write=move(*read), ++both
^
2 4 8 X X 5 5 is odd, ++read
^ - this points to the new end.
So, in general, you cannot rely on count
returning any meaningful values. Since in the case that move==copy (as is for ints
) the resulting array is 2 4 8|4 8 5
. Which has incorrect count both for the odd and even numbers. In case of std::unique_ptr
the X==nullptr
and thus the count for nullptr
and removed values might be wrong. Other remaining values should not be left in the end part of the array as there were no copies done.
Note that the values are not unspecified as in you cannot know them. They are exactly the results of move assignments which might leave the value in unspecified state. If it specified the state of the moved-from variables ( asstd::unique_ptr
does) then they would be known. E.g. if move==swap
then the range will be permuted only.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With