Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can the back() iterator of a vector be safely assumed to be the end() iterator after a pop_back()?

My problem is the following :

std::vector<struct pollfd> vec = { ... }; // Actually a member variable on a Server object

for (auto iter = vec.begin(); iter != vec.end(); ) {
    if (client_closed_connection(*iter)) {
        *iter = std::move(*vec.back()); // Move item to delete to the end
        vec.pop_back(); // perform deletion
        // Dont increment cause we dont want to skip the newly moved element
    } else {
        process_client_request(*iter);
        iter++;
    }
}

I'm not using "remove and erase" idiom or similar, because the order of the elements doesn't matter, so I dont want to waste resources trying to preserve it (its code for a list of epoll() fds of clients connected to a server, so deletions while iterating the vector is expected and will happen quite often as clients get their response and disconnect).

While this makes sense to me, and seems to work in practice, I can't help but wonder if i'm potentially relying on UB.

The only issue, is if client_closed_connection() returns false for the last valid element (for when iter == back()). In this case, *iter = std::move(*vec.back()) will essentially do nothing (self assignment), but then we pop_back() which according to cpprefernce "invalidates iterator to last element, as well as end() iterator". In our case, iter would thus be invalidated, but is then compared to vec.end() to end the loop.

My questions are as follow :

  • Does "invalidated" just means I can't dereference the iterator ? (which would be fine since i'm only comparing it to end())
  • Am I guarenteed that iter == end() since vector uses LegacyRandomAccessIterator, contiguous_iterator as its iterator type and due to the contiguous memory requirement, end() will always become equal to iter ?

And also, why isn't there a standard way to remove elements without preserving the order ? I dont think there are that many usecases where the ordering would matter, and at least having the choice in routines would be nice (eg: erase and quick_erase)

like image 655
Azyrod Avatar asked Sep 21 '25 12:09

Azyrod


2 Answers

To answer the specific question:

Can the back() iterator of a vector be safely assumed to be the end() iterator after a pop_back()?

According to the standard: no.

From the standard:

From: n4981
24.3.11.5 Modifiers [vector.modifiers]
constexpr void pop_back();
Effects: Invalidates iterators and references at or after the point of the erase.

The pop_back removes the last element of the vector. Thus, any iterators that refer to the last element or end are "invalid".

I could not find a definition for "Invalidates" in the standard so we can not assume anything, which includes that we can not assume that an iterator that was previously on end is valid.

So I would say this is likely an invalid program according to the standard.


Alternative 1.

Rather than using pop_back() you could use erase(--end()).

Note this question: https://stackoverflow.com/a/263958/14065

Basically: In C++11 erase() was modified to return the iterator of the next item (because otherwise it is not intuitive how to erase and continue iterating a vector).

for (auto loop = std::begin(cont); loop != std::end(cond);)
{
    if (shouldIeraseElement(*loop)) {
        loop = cont.erase(loop);
    }
    else {
        ++loop;
    }
}

Not the exact problem you have, but a similar pattern.


Alternative 2.

You can test if you are about to remove the last element and take explicit action to break out of the loop.


Alternative 3.

I would change how you write this:

In C++, there is a standard idiom called erase/remove.

  1. You do a remove which moves all the bad items to end of the container.
  2. You erase (delete from the container) all the bad items.

This is how I might write it:

template<typename I, typename A>
I remove_non_stable_if(I begin, I end, A&& action)
{
    for(auto loop = begin; loop != end;)
    {
        if (action(*loop)) {
            std::iter_swap(loop, --end);
        }
        else {
            ++loop;
        }
    }
    return end;
}
.....

auto bad = remove_non_stable_if(std::begin(vec), std::end(vec), [](pollfd const& fd)
    {
        if (client_closed_connection(fd)) {
            return true;
        }
        process_client_request(fd);
        return false;
    }
);
vec.erase(bad, std::end(vec));

The advantage:

  • You don't have an issue with wondering about bad iterators.
  • You resize the container only once.

Note: std::remove_if is stable, thus it will do more moves than your initial algorithm. BUT It should be simple to write a non-stable version, but still uses the erase/remove idiom.

like image 83
Martin York Avatar answered Sep 23 '25 03:09

Martin York


Lacking a specific reason to do otherwise, I'd start by partitioning the array, then remove the elements for which the predicate returned false:

auto pos = std::partition(vec.begin(), vec.end(),
    [](auto fd) { return !client_closed(fd); });
vec.erase(pos, vec.end());
for (auto fd: vec) {
    process_client_request(fd);
}

std::partition doesn't try to maintain the order of the retained elements, so it takes advantage of the lack of ordering as you want.

Multiple Iterations

Given that these are struct pollfds: 1) we can expect there are no more than a few thousand of them. 10K would be quite a large number of simultaneous connections, and 2) they're only 64 bits apiece.

During the second iteration, we can normally expect that most (if not all) of the array will be in the L1 cache (and any that isn't will almost certainly be in L2). This means iterating through the array twice vs. once is unlikely to make any meaningful difference in speed. In fact, it may improve branch prediction enough to reduce the overall time.

like image 31
Jerry Coffin Avatar answered Sep 23 '25 01:09

Jerry Coffin