Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is vector::insert allowed to reserve only once and avoid further capacity checks?

vector::insert(dst_iterator, src_begin, src_end) (insert a range) can be optimized for random-access iterators to reserve the required capacity src_end - src_begin first, then perform the copy.

The main question I have: Does the Standard also allow vector::insert to avoid a capacity check for each copied element? (I.e. not using push_back or similar on every element to be inserted)

I'll refer to avoiding this capacity check as "optimization of insert".


What could go wrong: I can imagine an iterator with side effects when dereferenced:

Note: the Standard guarantees the iterators passed to insert will be dereferenced exactly once (see end of question).

#include <vector>
#include <iterator>
#include <iostream>

template < typename T >
struct evil_iterator : std::iterator < std::random_access_iterator_tag, T >
{
    using base = std::iterator < std::random_access_iterator_tag, T >;

    std::vector<T>* evil_feedback;
    typename std::vector<T>::iterator innocent_iterator;

    evil_iterator( std::vector<T>* c,
                   typename std::vector<T>::iterator i )
        : evil_feedback{c}
        , innocent_iterator{i}
    {}

    void do_evil()
    {
        std::cout << "trying to do evil; ";
        std::cout << "cap: " << evil_feedback->capacity() << ", ";
        std::cout << "size: " << evil_feedback->size() << ", ";

        // better not invalidate the iterators of `*evil_feedback`
        // passed to the `insert` call (see example below)
        if( evil_feedback->capacity() > evil_feedback->size() )
        {
            evil_feedback->push_back( T{} );
            // capacity() might be == size() now
            std::cout << "successful >:]" << std::endl;
        }else
        {
            std::cout << "failed >:[" << std::endl;
        }
    }

    T& operator*()
    {
        do_evil();  // <----------------------------------------
        return *innocent_iterator;
    }


    // non-evil iterator member functions-----------------------

    evil_iterator& operator++()
    {
        ++innocent_iterator;
        return *this;
    }
    evil_iterator& operator++(int)
    {
        evil_iterator temp(*this);
        ++(*this);
        return temp;
    }


    evil_iterator& operator+=(typename base::difference_type p)
    {
        innocent_iterator += p;
        return *this;
    }
    evil_iterator& operator-=(typename base::difference_type p)
    {
        innocent_iterator -= p;
        return *this;
    }

    evil_iterator& operator=(evil_iterator const& other)
    {
        evil_feedback = other.evil_feedback;
        innocent_iterator = other.innocent_iterator;
        return *this;
    }

    evil_iterator operator+(typename base::difference_type p)
    {
        evil_iterator temp(*this);
        temp += p;
        return temp;
    }
    evil_iterator operator-(typename base::difference_type p)
    {
        evil_iterator temp(*this);
        temp -= p;
        return temp;
    }

    typename base::difference_type operator-(evil_iterator const& p)
    {
        return this->innocent_iterator - p.innocent_iterator;
    }

    bool operator!=(evil_iterator const& other) const
    {  return innocent_iterator != other.innocent_iterator;  }
};

Example:

int main()
{
    std::vector<int> src = {3, 4, 5, 6};
    std::vector<int> dst = {1, 2};

    evil_iterator<int> beg = {&dst, src.begin()};
    evil_iterator<int> end = {&dst, src.end()};

    // explicit call to reserve, see below
    dst.reserve( dst.size() + src.size() );
    // using dst.end()-1, which stays valid during `push_back`,
    //   thanks to Ben Voigt pointing this out
    dst.insert(dst.end()-1, beg, end);  // <--------------- doing evil?

    std::copy(dst.begin(), dst.end(), 
              std::ostream_iterator<int>{std::cout, ", "});
}

Questions:

  1. Can vector::insert be optimized to avoid a capacity check for each inserted element?
  2. Is evil_iterator still a valid iterator?
  3. If so, is evil_iterator evil, i.e. can it result in UB / non-complying behaviour if insert is optimized as described above?

Maybe my do_evil is not evil enough.. have no problems on clang++ 3.2 (using libstdc++):

Edit 2: Added the call to reserve. Now, I'm doing evil :)

trying to do evil; cap: 6, size: 2, successful >:]
trying to do evil; cap: 6, size: 3, successful >:]
trying to do evil; cap: 6, size: 4, successful >:]
trying to do evil; cap: 6, size: 9, failed >:[
1, 3, 4, 5, 6, 0, 0, 135097, 2,

Edit: Why I think the optimization could break this:

  1. Consider dst.size() == dst.capacity() == 2 at the beginning.
  2. The call to insert requires a new capacity of 6.
  3. The optimization enlarges the capacity to exactly 6, then begins to insert the elements by copying from the src iterators (beg, end).
  4. This copying is done within a loop where no capacity checks occur. (That is the optimization.)
  5. During the process of copying, further elements are added to the vector (w/o invalidating the iterators), in do_evil. The capacity now is not sufficient any more to hold the rest of the elements to be copied.

Maybe you had to use reserve in the example explicitly to force updating the observable capacity before using do_evil. Currently, insert could reserve some capacity but change what capacity returns (i.e. observable capacity) only after the copying is done.


What I've found in the Standard so far seems to allow the optimization of insert:

[sequence.reqmts]/3

a.insert(p,i,j) [...]

Requires: T shall be EmplaceConstructible into X from *i.

For vector, if the iterator does not meet the forward iterator requirements (24.2.5), T shall also be MoveInsertable into X and MoveAssignable. Each iterator in the range [i,j) shall be dereferenced exactly once.

pre: i and j are not iterators into a. Inserts copies of elements in [i, j) before p

[vector.modifiers] on insert

1 Remarks: Causes reallocation if the new size is greater than the old capacity. If no reallocation happens, all the iterators and references before the insertion point remain valid. If an exception is thrown other than by the copy constructor, move constructor, assignment operator, or move assignment operator of T or by any InputIterator operation there are no effects. If an exception is thrown by the move constructor of a non-CopyInsertable T, the effects are unspecified.

2 Complexity: The complexity is linear in the number of elements inserted plus the distance to the end of the vector.

like image 703
dyp Avatar asked May 17 '13 19:05

dyp


People also ask

What does std::vector Reserve do?

std::vector class provides a useful function reserve which helps user specify the minimum size of the vector.It indicates that the vector is created such that it can store at least the number of the specified elements without having to reallocate memory.

How does std::vector allocate memory How does its capacity grow?

As mentioned above, std::vector is a templated class that represents dynamic arrays. std::vector typically allocates memory on the heap (unless you override this behavior with your own allocator). The std::vector class abstracts memory management, as it grows and shrinks automatically if elements are added or removed.

How do you reserve a memory vector?

An std::vector manages its own memory. You can use the reserve() and resize() methods to have it allocate enough memory to fit a given amount of items: std::vector<int> vec1; vec1. reserve(30); // Allocate space for 30 items, but vec1 is still empty.

Does vector erase change capacity?

No. That's implied by the fact that iterators, pointers and references prior to the point of erase remain valid. Reducing the capacity would require a reallocation.


1 Answers

Looking again, I think this rule (section 17.6.4.9) is a clearer prohibition on what you tried to do:

Each of the following applies to all arguments to functions defined in the C++ standard library, unless explicitly stated otherwise.

  • If an argument to a function has an invalid value (such as a value outside the domain of the function or a pointer invalid for its intended use), the behavior is undefined.

I think this rule applies during the entire duration of the function call, and not only at function entry.

Furthermore, push_back() guarantees that (23.3.7.5):

If no reallocation happens, all the iterators and references before the insertion point remain valid.

Your position passed to insert, which is dst.end() as evaluated before the insert call, is not before the insertion point of the first evil_feedback->push_back() call, so it does not remain valid (the fact that you carefully avoided reallocation here does not save you, as you only met half the condition). Which means the argument you passed to std::vector::insert, a function defined in the C++ Standard Library, is invalid during the duration of that call, landing you squarely in the realm of undefined behavior.


Previous answer:

I think you violated this precondition that you quoted:

pre: i and j are not iterators into a.

like image 195
Ben Voigt Avatar answered Sep 29 '22 05:09

Ben Voigt