Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::vector inconsistent crash between push_back and insert(end(),x)

Tags:

Put this code into MS Visual C++ 2010, compile (debug or release), and it will crash for the insert() loop, but not the push_back loop:

#include <vector>
#include <string>

using std::vector;
using std::string;

int main()
{
   vector<string> vec1;
   vec1.push_back("hello");

   for (int i = 0; i != 10; ++i)
      vec1.push_back( vec1[0] );

   vector<string> vec2;
   vec2.push_back("hello");

   for (int i = 0; i != 10; ++i)
      vec2.insert( vec2.end(), vec2[0] );

   return 0;
}

The problem is that both push_back() and insert() take the new item by reference, and when the vector is realloced for more space, the new item becomes invalidated BEFORE it is inserted.

GCC should also have this problem. I have not checked Clang, but it depends which STD library it is using.

MSVC2010 has some extra code in push_back() that detects if the new item is actually an item within the vector. If so, it records the item's index and uses that to insert the item after memory has been alloced (instead of using the now-invalidated reference) -- uses _Inside(_STD addressof(_Val))

Is MSVC's extra code non-standard?

My concern is that I am not sure in what code I may have done something like vec.push_back(vec[1]); or vec.insert(it, vec[2]); I'd have to look through hundreds if not thousands of lines of code that use push_back and insert, and that is just my own code... 3rd party libraries could also be affected.

I assume that GCC could be made to die in horrible ways using this technique (I see no extra code to handle this case, but valgrind didn't detect it in my simple example so will be harder to test),

How best to detect and avoid making this mistake?

Is MSVC2010's extra push_back() code non-standard? Should MSVC instead detect and assert when it finds vectors used in this way? (ie the secure-computing initiative)

I am thinking of hacking MSVC2010 and GCC's headers to detect these cases.

any other ideas?

Thanks, Paul

PS: note also that this usage is perfectly fine (and efficient) IF you can guarantee that the vector does not need to be resized

like image 574
elegant dice Avatar asked Jul 25 '12 15:07

elegant dice


People also ask

Is Emplace_back faster than Push_back?

With the simple benchmark here, we notice that emplace_back is 7.62% faster than push_back when we insert 1,000,000 object (MyClass) into an vector.

Is vector Push_back slow?

The main reason why push_back is slow is because of multiple reallocation of memory. Every vector has vector::size and vector::capacity. vector::size gives the number of elements in the vector and vector::capacity gives the number of elements vector can store.

Where are elements placed within a vector when using Push_back ()?

push_back() function is used to push elements into a vector from the back. The new value is inserted into the vector at the end, after the current last element and the container size is increased by 1.

What is the time complexity of Push_back in vector?

push_back(): Inserts a new element at the end of the vector. Its time complexity is O(1).


1 Answers

Ok, I installed Win8 + MSVC2012 on virtualbox to try it out. Geez Windows 8 is annoying with the mouse, no buttons to push just hovering which is hard to do with a screen-in-a-window.

The results are interesting and still inconsistent IMHO.

MSVC 2010: the bug comes from the move-semantics, as ecatmur suggested.

The problem is that v.insert(v.end(),v[0]); will select the insert(it, T && val) method, which is wrong on two fronts: 1) it could lead to the destruction of v[0]. It does not seem to, which suggests to me that the const& reference is preserved and the new version is created via copy rather than move. and 2) the code path does not make a copy of val before resizing the vector.

Note that the problem was not noticed sooner due to extra code (hacks?) in push_back(&&) - see further commentary at the bottom in relation to MSVC2012.

(note that insert(it,const&) will correctly copy the new item first before resizing the vector, so if the right method was selected there would have been no problem at all).

In MSVC 2012, this is fixed via correctly selecting the insert(it, const T & val) method, HOWEVER you can still see that push_back() has some extra code to "fix" incorrect usage.

Consider this test:

#include <vector>
#include <string>

using std::vector;
using std::string;

int main()
{
   vector<string> vec1;
   vec1.push_back("hello");

   for (int i = 0; i != 1000; ++i)
   {
       string temp = vec1[0];
      vec1.push_back( std::move(vec1[0]) );
   }

   vector<string> vec2;
   vec2.push_back("hello");

   for (int i = 0; i != 1000; ++i)
   {
       string temp = vec2[0];
      vec2.insert( vec2.end(), std::move(vec2[0]) );
   }

   return 0;
}

In both cases, std::move() is used to force the && move methods to be chosen. In both cases, the code should cause catastrophe and hopefully crash.

However, in MSVC 2012, the push_back() loop works fine, because there is some extra code in push_back(&&) that detects if _Val is in the same address space as the vector, and if so will make a copy rather than move. But, what if the new item was not strictly in the same memory space but still part of the original vector (eg pimpl pointer) ? I can imagine ways to make push_back(&&) die like it should.

Surely this is not actually necessary, if the programmer says std::move() then that's what should happen, right? The extra check is surely using some unnecessary CPU cycles.

The insert() loop does not have this hack, which also means that using std::move() incorrectly will only cause corruption SOMETIMES. Personally I'd prefer fast-fail over fail-only-when-you-are-demonstrating-to-the-client.

So... solutions...

  1. Do not use v.insert(v.end(), v[0]) or similar. This is an unreasonable requirement, as 3rd party code (eg Boost, VTK, QT, tbb, xml libraries, etc etc) may be using that somewhere in their millions of lines of code. All the 3rd party libraries I use, I recompile, so whatever my code suffers with, they suffer too.

  2. Upgrade to MSVC 2012 RC. I'll have to wait till it goes Gold, then it will work as expected (with new and exciting bugs in other parts).

  3. Hack the headers to detect usage. I've done that, but the only time the detection works is when the code is actually run.

  4. Hack the headers to fix insert(&&). (and recompile ALL of the libraries/projects - sigh). The easiest approach would be to simply comment out the insert(&&) variant (and then we are back to pre-C++11 performance). Another approach is to use the same push_back(&&) hack, although I don't see that as a reliable approach. Perhaps push_back(&&) should also be commented out.

Further update: I fixed the headers. Turned out to be simple...

MSVC2010's insert(&&) declaration looks like this:

template<class _Valty>
iterator insert(const_iterator _Where, _Valty&& _Val)

MSVC2012's insert(&&) removed the template part and now looks like this:

iterator insert(const_iterator _Where, _Ty&& _Val)

So I simply removed the templated _Valty from MSVC2010's insert() and now the correct method is chosen. It now also matches how push_back(&&) is declared (ie no template on the parameter). There are still templated parameters for emplace*(&&) methods, but there is no const& confusion there.

like image 190
elegant dice Avatar answered Oct 01 '22 06:10

elegant dice