Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is stability of std::remove and std::remove_if design fail?

Recently (from one SO comment) I learned that std::remove and std:remove_if are stable. Am I wrong to think this is a terrible design choice since it prevents certain optimizations?

Imagine removing the first and fifth elements of a 1M std::vector. Because of stability, we can't implement remove with swap. Instead we must shift every remaining element. :(

If we weren't limited by stability we could (for RA and BD iter) practically have 2 iters, one from front, second from behind, and then use swap to bring to-be-removed items to end. I'm sure smart people could maybe do even better. My question is in general, not about specific optimization I'm talking about.

EDIT: please note that C++ advertizes the zero overhead principle, and also there are std::sort and std::stable_sort sort algorithms.

EDIT2: optimization would be something like the following:

For remove_if:

  • bad_iter looks from the beginning for those elements for which the predicate returns true.
  • good_iter looks from the end for those elements for which the predicate returns false.

when both have found what is expected they swap their elements. Termination is at good_iter <= bad_iter.

If it helps, think of it like one iter in quick sort algorithm, but we don't compare them to a special element, but instead we use the above predicate.

EDIT3: I played around and tried to find worst case (worst case for remove_if - notice how rarely the predicate would be true) and I got this:

#include <vector>
#include <string>
#include <iostream>
#include <map>
#include <algorithm>
#include <cassert>
#include <chrono>
#include <memory>
using namespace std;
int main()
{  
    vector<string> vsp;
    int n;
    cin >> n;
    for (int i =0; i < n; ++i)
    {   string s = "123456";
        s.push_back('a' + (rand() %26));
        vsp.push_back(s);
    }
    auto vsp2 = vsp;
    auto remove_start = std::chrono::high_resolution_clock::now();
    auto it=remove_if(begin(vsp),end(vsp), [](const string& s){ return s < "123456b";});
    vsp.erase(it,vsp.end());
    cout << vsp.size() << endl;
    auto remove_end = std::chrono::high_resolution_clock::now();
    cout << "erase-remove: " << chrono::duration_cast<std::chrono::milliseconds>(remove_end-remove_start).count() << " milliseconds\n";

    auto partition_start = std::chrono::high_resolution_clock::now();
    auto it2=partition(begin(vsp2),end(vsp2), [](const string& s){ return s >= "123456b";});
    vsp2.erase(it2,vsp2.end());
    cout << vsp2.size() << endl;
    auto partition_end = std::chrono::high_resolution_clock::now();
    cout << "partition-remove: " << chrono::duration_cast<std::chrono::milliseconds>(partition_end-partition_start).count() << " milliseconds\n";
}



C:\STL\MinGW>g++ test_int.cpp -O2 && a.exe
12345678
11870995
erase-remove: 1426 milliseconds
11870995
partition-remove: 658 milliseconds

For other usages, partition is bit faster, same or slower. Color me puzzled. :D

like image 500
NoSenseEtAl Avatar asked Dec 11 '12 10:12

NoSenseEtAl


People also ask

How std:: remove works?

std::remove, std::remove_if. Removes all elements satisfying specific criteria from the range [first, last) and returns a past-the-end iterator for the new end of the range. 1) Removes all elements that are equal to value , using operator== to compare them. 3) Removes all elements for which predicate p returns true.

What STD remove returns?

std :: remove Transforms the range [first,last) into a range with all the elements that compare equal to val removed, and returns an iterator to the new end of that range. The function cannot alter the properties of the object containing the range of elements (i.e., it cannot alter the size of an array or a container).

How does remove if work C++?

C++ Algorithm remove_if() function is used to eliminate all the elements that satisfy a predicate from a given range [first, last) without disturbing the order of the remaining elements. This function cannot alter the size of the container. It returns an iterator to the new end of the range.

What you have to do to actually remove the specified elements from the container?

First, you use remove_if/remove to move all elements which don't fit the remove criteria to the front of the range, keeping the relative order of the elements. So after calling remove_if/remove , a single call of erase deletes all remaining elements at the end of the range.


2 Answers

I assume you're asking about a hypothetical definition of stable_remove to be what remove currently is, and remove to be implemented however the implementer thinks is best to give the correct values in any order. With an expectation that implementers will be able to improve on just doing exactly the same as stable_remove.

In practice, the library can't easily do this optimization. It depends on the data, but you don't want to spend too long to work out how many elements will be removed before deciding on how to remove each one. For example you could do an extra pass to count them, but there are plenty of cases where that extra pass is inefficient. Just because an unstable remove is faster than stable for certain cases doesn't necessarily mean that an adaptive algorithm to choose between the two is a good bet.

I think the difference between remove and sort is that sorting is known to be a complicated problem with a lot of different solutions and trade-offs and tweaks. All "simple" sort algorithms are slow on average. Most standard algorithms are pretty simple, and remove is one of them but sort is not. I don't think it makes a lot of sense therefore to define stable_remove and remove as separate standard functions.

Edit: your edit with my tweak (similar to std::partition but no need to keep the values on the right) seems pretty reasonable to me. It requires a bidirectional iterator, but there is precedent in the standard for algorithms that behave differently on different iterator categories, such as std::distance. So it would be possible for the standard to define unstable_remove that only requires a forward iterator, but does your thing if it gets a bidi iterator. The standard probably wouldn't lay out the algorithm, but it could have a phrase like "if the iterator is bidirectional, does at most min(k, n-k) moves where k is the number of elements removed", which would in effect force it. But note that the standard doesn't currently say how many moves remove_if does, so I reckon that pinning this down simply wasn't a priority.

There is of course nothing stopping you from implementing your own unstable_remove.

If we accept that the standard didn't need to specify an unstable remove, the question then comes down to whether the function it does define should have been called stable_remove, anticipating a future remove that behaves differently for bidi iterators, and might behave differently for forward iterators if some clever heuristic for doing an unstable remove ever becomes well enough known to be worth a standard function. I'd say not: it is not a disaster if the names of standard functions aren't completely regular. It could have been pretty disruptive to remove the guarantee of stability from the STL's remove_if. Then the question becomes, "why didn't the STL call it stable_remove_if", to which I can only answer that in addition to all the points made in all the answers, the STL design process was a sight quicker than the standardization process.

stable_remove would also open a can of worms regarding other standard functions that could in theory have unstable versions. For a particularly silly example should copy be called stable_copy, just in case some implementation exists on which its demonstrably faster to reverse the order of elements while copying? Should copy be called copy_forward, so that the implementation can choose which of copy_backward and copy_forward is called by copy according to which is faster? Part of the committee's job is to draw a line somewhere.

I think realistically the current standard is sensible, and it would be sensible to separately define a stable_remove and a remove_with_some_other_constraints, but remove_in_some_unspecified_way just doesn't give the same opportunity for optimization that sort_in_some_unspecified_way does. Introsort was invented in 1997, just as C++ was being standardized, but I don't imagine the research effort around remove is quite what it was and is around sort. I may be wrong, optimizing remove might be the next big thing, and if so then the committee has missed a trick.

like image 181
12 revs Avatar answered Nov 15 '22 02:11

12 revs


std::remove is specified to work with forward iterators.

The approach with working with a pair of iterators, from beginning and from the end, would either increase the requirements for the iterators and thus decrease the utility of the function or violate/worsen asymptotic complexity guarantees.

like image 36
chill Avatar answered Nov 15 '22 02:11

chill