I just read this blog http://lemire.me/blog/archives/2012/06/20/do-not-waste-time-with-stl-vectors/ comparing performance of <code>operator[]</code> assignment and <code>push_back</code> on memory pre-reserved <code>std::vector</code> and I decided to try it myself. The operation is simple: <pre class="prettyprint"><code>// for vector bigarray.reserve(N); // START TIME TRACK for(int k = 0; k < N; ++k) // for operator[]: // bigarray[k] = k; // for push_back bigarray.push_back(k); // END TIME TRACK // do some dummy operations to prevent compiler optimize long sum = accumulate(begin(bigarray), end(array),0 0); </code></pre> And here is the result: <pre class="prettyprint"><code> ~/t/benchmark> icc 1.cpp -O3 -std=c++11 ~/t/benchmark> ./a.out [ 1.cpp: 52] 0.789123s --> C++ new [ 1.cpp: 52] 0.774049s --> C++ new [ 1.cpp: 66] 0.351176s --> vector [ 1.cpp: 80] 1.801294s --> reserve + push_back [ 1.cpp: 94] 1.753786s --> reserve + emplace_back [ 1.cpp: 107] 2.815756s --> no reserve + push_back ~/t/benchmark> clang++ 1.cpp -std=c++11 -O3 ~/t/benchmark> ./a.out [ 1.cpp: 52] 0.592318s --> C++ new [ 1.cpp: 52] 0.566979s --> C++ new [ 1.cpp: 66] 0.270363s --> vector [ 1.cpp: 80] 1.763784s --> reserve + push_back [ 1.cpp: 94] 1.761879s --> reserve + emplace_back [ 1.cpp: 107] 2.815596s --> no reserve + push_back ~/t/benchmark> g++ 1.cpp -O3 -std=c++11 ~/t/benchmark> ./a.out [ 1.cpp: 52] 0.617995s --> C++ new [ 1.cpp: 52] 0.601746s --> C++ new [ 1.cpp: 66] 0.270533s --> vector [ 1.cpp: 80] 1.766538s --> reserve + push_back [ 1.cpp: 94] 1.998792s --> reserve + emplace_back [ 1.cpp: 107] 2.815617s --> no reserve + push_back </code></pre> For all the compilers, <code>vector</code> with <code>operator[]</code> is much faster than raw pointer with <code>operator[]</code>. This led to the first question: why? The second question is, I have already "reserved" the memory, so why <code>opeator[]</code> is faster? The next question is, since the memory is already allocated, why <code>push_back</code> is times slower than <code>operator[]</code>? The test code is attached below: <pre class="prettyprint"><code>#include <iostream> #include <iomanip> #include <vector> #include <numeric> #include <chrono> #include <string> #include <cstring> #define PROFILE(BLOCK, ROUTNAME) ProfilerRun([&](){do {BLOCK;} while(0);}, \ ROUTNAME, __FILE__, __LINE__); template <typename T> void ProfilerRun (T&& func, const std::string& routine_name = "unknown", const char* file = "unknown", unsigned line = 0) { using std::chrono::duration_cast; using std::chrono::microseconds; using std::chrono::steady_clock; using std::cerr; using std::endl; steady_clock::time_point t_begin = steady_clock::now(); // Call the function func(); steady_clock::time_point t_end = steady_clock::now(); cerr << "[" << std::setw (20) << (std::strrchr (file, '/') ? std::strrchr (file, '/') + 1 : file) << ":" << std::setw (5) << line << "] " << std::setw (10) << std::setprecision (6) << std::fixed << static_cast<float> (duration_cast<microseconds> (t_end - t_begin).count()) / 1e6 << "s --> " << routine_name << endl; cerr.unsetf (std::ios_base::floatfield); } using namespace std; const int N = (1 << 29); int routine1() { int sum; int* bigarray = new int[N]; PROFILE ( { for (unsigned int k = 0; k < N; ++k) bigarray[k] = k; }, "C++ new"); sum = std::accumulate (bigarray, bigarray + N, 0); delete [] bigarray; return sum; } int routine2() { int sum; vector<int> bigarray (N); PROFILE ( { for (unsigned int k = 0; k < N; ++k) bigarray[k] = k; }, "vector"); sum = std::accumulate (begin (bigarray), end (bigarray), 0); return sum; } int routine3() { int sum; vector<int> bigarray; bigarray.reserve (N); PROFILE ( { for (unsigned int k = 0; k < N; ++k) bigarray.push_back (k); }, "reserve + push_back"); sum = std::accumulate (begin (bigarray), end (bigarray), 0); return sum; } int routine4() { int sum; vector<int> bigarray; bigarray.reserve (N); PROFILE ( { for (unsigned int k = 0; k < N; ++k) bigarray.emplace_back(k); }, "reserve + emplace_back"); sum = std::accumulate (begin (bigarray), end (bigarray), 0); return sum; } int routine5() { int sum; vector<int> bigarray; PROFILE ( { for (unsigned int k = 0; k < N; ++k) bigarray.push_back (k); }, "no reserve + push_back"); sum = std::accumulate (begin (bigarray), end (bigarray), 0); return sum; } int main() { long s0 = routine1(); long s1 = routine1(); long s2 = routine2(); long s3 = routine3(); long s4 = routine4(); long s5 = routine5(); return int (s1 + s2); } </code></pre>

<code>push_back</code> does a bounds check. <code>operator[]</code> does not. So even if you've reserved the space, <code>push_back</code> is going to have an extra conditional check that <code>operator[]</code> will not have. Additionally, it will increase the <code>size</code> value (reserve only sets the <code>capacity</code>), so it will update that every time. In short, <code>push_back</code> is doing more than what <code>operator[]</code> is doing - which is why it is slower (and more accurate).

Why push_back is slower than operator[] for a previously allocated vector

Tags:

c++

c++11

stl

vector

I just read this blog http://lemire.me/blog/archives/2012/06/20/do-not-waste-time-with-stl-vectors/ comparing performance of operator[] assignment and push_back on memory pre-reserved std::vector and I decided to try it myself. The operation is simple:

// for vector bigarray.reserve(N);  // START TIME TRACK for(int k = 0; k < N; ++k)    // for operator[]:    // bigarray[k] = k;    // for push_back    bigarray.push_back(k); // END TIME TRACK  // do some dummy operations to prevent compiler optimize long sum = accumulate(begin(bigarray), end(array),0 0);

And here is the result:

 ~/t/benchmark> icc 1.cpp -O3 -std=c++11  ~/t/benchmark> ./a.out [               1.cpp:   52]     0.789123s  --> C++ new [               1.cpp:   52]     0.774049s  --> C++ new [               1.cpp:   66]     0.351176s  --> vector [               1.cpp:   80]     1.801294s  --> reserve + push_back [               1.cpp:   94]     1.753786s  --> reserve + emplace_back [               1.cpp:  107]     2.815756s  --> no reserve + push_back  ~/t/benchmark> clang++ 1.cpp -std=c++11 -O3  ~/t/benchmark> ./a.out [               1.cpp:   52]     0.592318s  --> C++ new [               1.cpp:   52]     0.566979s  --> C++ new [               1.cpp:   66]     0.270363s  --> vector [               1.cpp:   80]     1.763784s  --> reserve + push_back [               1.cpp:   94]     1.761879s  --> reserve + emplace_back [               1.cpp:  107]     2.815596s  --> no reserve + push_back  ~/t/benchmark> g++ 1.cpp -O3 -std=c++11  ~/t/benchmark> ./a.out [               1.cpp:   52]     0.617995s  --> C++ new [               1.cpp:   52]     0.601746s  --> C++ new [               1.cpp:   66]     0.270533s  --> vector [               1.cpp:   80]     1.766538s  --> reserve + push_back [               1.cpp:   94]     1.998792s  --> reserve + emplace_back [               1.cpp:  107]     2.815617s  --> no reserve + push_back

For all the compilers, vector with operator[] is much faster than raw pointer with operator[]. This led to the first question: why? The second question is, I have already "reserved" the memory, so why opeator[] is faster?

The next question is, since the memory is already allocated, why push_back is times slower than operator[]?

The test code is attached below:

#include <iostream> #include <iomanip> #include <vector> #include <numeric> #include <chrono> #include <string> #include <cstring>  #define PROFILE(BLOCK, ROUTNAME) ProfilerRun([&](){do {BLOCK;} while(0);}, \         ROUTNAME, __FILE__, __LINE__);  template <typename T> void ProfilerRun (T&&  func, const std::string& routine_name = "unknown",                   const char* file = "unknown", unsigned line = 0) {     using std::chrono::duration_cast;     using std::chrono::microseconds;     using std::chrono::steady_clock;     using std::cerr;     using std::endl;      steady_clock::time_point t_begin = steady_clock::now();      // Call the function     func();      steady_clock::time_point t_end = steady_clock::now();     cerr << "[" << std::setw (20)          << (std::strrchr (file, '/') ?              std::strrchr (file, '/') + 1 : file)          << ":" << std::setw (5) << line << "]   "          << std::setw (10) << std::setprecision (6) << std::fixed          << static_cast<float> (duration_cast<microseconds>                                 (t_end - t_begin).count()) / 1e6          << "s  --> " << routine_name << endl;      cerr.unsetf (std::ios_base::floatfield); }  using namespace std;  const int N = (1 << 29);  int routine1() {     int sum;     int* bigarray = new int[N];     PROFILE (     {         for (unsigned int k = 0; k < N; ++k)             bigarray[k] = k;     }, "C++ new");     sum = std::accumulate (bigarray, bigarray + N, 0);     delete [] bigarray;     return sum; }  int routine2() {     int sum;     vector<int> bigarray (N);     PROFILE (     {         for (unsigned int k = 0; k < N; ++k)             bigarray[k] = k;     }, "vector");     sum = std::accumulate (begin (bigarray), end (bigarray), 0);     return sum; }  int routine3() {     int sum;     vector<int> bigarray;     bigarray.reserve (N);     PROFILE (     {         for (unsigned int k = 0; k < N; ++k)             bigarray.push_back (k);     }, "reserve + push_back");     sum = std::accumulate (begin (bigarray), end (bigarray), 0);     return sum; }  int routine4() {     int sum;     vector<int> bigarray;     bigarray.reserve (N);     PROFILE (     {         for (unsigned int k = 0; k < N; ++k)             bigarray.emplace_back(k);     }, "reserve + emplace_back");     sum = std::accumulate (begin (bigarray), end (bigarray), 0);     return sum; }  int routine5() {     int sum;     vector<int> bigarray;     PROFILE (     {         for (unsigned int k = 0; k < N; ++k)             bigarray.push_back (k);     }, "no reserve + push_back");     sum = std::accumulate (begin (bigarray), end (bigarray), 0);     return sum; }   int main() {     long s0 = routine1();     long s1 = routine1();     long s2 = routine2();     long s3 = routine3();     long s4 = routine4();     long s5 = routine5();     return int (s1 + s2); }

550

asked Nov 23 '13 21:11

xis

2 Answers

push_back does a bounds check. operator[] does not. So even if you've reserved the space, push_back is going to have an extra conditional check that operator[] will not have. Additionally, it will increase the size value (reserve only sets the capacity), so it will update that every time.

In short, push_back is doing more than what operator[] is doing - which is why it is slower (and more accurate).

answered Oct 04 '22 10:10

Zac Howland

As Yakk and me have found out, there might be another interesting factor that contributes to the apparent slowness of push_back.

The first interesting observation is that in the original test, using new and operating on a raw array is slower than using vector<int> bigarray(N); and operator[] -- more than a factor 2. Even more interesting is that you can get the same performance for both by inserting an additional memset for the raw array variant:

int routine1_modified() {     int sum;     int* bigarray = new int[N];      memset(bigarray, 0, sizeof(int)*N);      PROFILE (     {         for (unsigned int k = 0; k < N; ++k)             bigarray[k] = k;     }, "C++ new");     sum = std::accumulate (bigarray, bigarray + N, 0);     delete [] bigarray;     return sum; }

The conclusion of course is, that PROFILE measures something different than expected. Yakk and me guess it has something to do with memory management; from Yakk's comment to the OP:

resize will touch the entire block of memory. reserve will allocate without touching. If you have a lazy allocator that doesn't fetch or assign physical memory pages until accessed, reserve on an empty vector can be nearly free (doesn't even have to find physical memory for the pages!) until you write to the pages (at which point, they have to be found).

I thought of something similar, so tried a small test for this hypothesis by touching certain pages with a "strided memset" (a profiling tool might get more reliable results):

int routine1_modified2() {     int sum;     int* bigarray = new int[N];      for(int k = 0; k < N; k += PAGESIZE*2/sizeof(int))         bigarray[k] = 0;      PROFILE (     {         for (unsigned int k = 0; k < N; ++k)             bigarray[k] = k;     }, "C++ new");     sum = std::accumulate (bigarray, bigarray + N, 0);     delete [] bigarray;     return sum; }

By changing the stride from every page half to every 4th page to leaving it out completely, we get a nice transition of the timings from the vector<int> bigarray(N); case to the new int[N] case where no memset has been used.

In my opinion, that's a strong hint that memory management is a major contributor to the measurement results.

Another issue is the branching in push_back. It is claimed in many answers that this is a / the major reason why push_back is much slower as compared to using operator[]. Indeed, comparing the raw-pointer w/o memset to using reserve + push_back, the former is two times faster.

Similarly, if we add a bit of UB (but check the results later):

int routine3_modified() {     int sum;     vector<int> bigarray;     bigarray.reserve (N);      memset(bigarray.data(), 0, sizeof(int)*N); // technically, it's UB      PROFILE (     {         for (unsigned int k = 0; k < N; ++k)             bigarray.push_back (k);     }, "reserve + push_back");     sum = std::accumulate (begin (bigarray), end (bigarray), 0);     return sum; }

this modified version is about 2 times slower than using new + a full memset. So it seems whatever the invocation of push_back does, it results in a factor 2 slowdown when compared to just setting the element (via operator[] in both the vector and raw array case).

But is it the branching required in push_back, or the additional operation?

// pseudo-code void push_back(T const& p) {     if(size() == capacity())     {         resize( size() < 10 ? 10 : size()*2 );     }      (*this)[size()] = p; // actually using the allocator     ++m_end; }

It is indeed that simple, see e.g. libstdc++'s implementation.

I've tested it by using the vector<int> bigarray(N); + operator[] variant, and inserting a function call that mimics the behaviour of push_back:

unsigned x = 0; void silly_branch(int k) {     if(k == x)     {         x = x < 10 ? 10 : x*2;     } }  int routine2_modified() {     int sum;     vector<int> bigarray (N);     PROFILE (     {         for (unsigned int k = 0; k < N; ++k)         {             silly_branch(k);             bigarray[k] = k;         }     }, "vector");     sum = std::accumulate (begin (bigarray), end (bigarray), 0);     return sum; }

Even when declaring x as volatile, this only has 1 % influence on the measurement. Of course, you had to verify that the branch is actually in the opcode, but my assembler knowledge doesn't allow me to verify that (at -O3).

The interesting point now is what happens when I add an increment to silly_branch:

unsigned x = 0; void silly_branch(int k) {     if(k == x)     {         x = x < 10 ? 10 : x*2;     }     ++x; }

Now, the modified routine2_modified runs 2 times slower than the original routine2, being on par with the proposed routine3_modified above that includes UB to commit the memory pages. I don't find that particularly surprising, as it adds another write to every write in the loop, so we have two times the work and two times the duration.

Conclusion

Well you had to carefully look at the assembly and profiling tools to verify the hypotheses of memory management and the additional write is a good hypothesis ("correct"). But I think the hints are strong enough to claim that there's something more complicated going on than just a branch which makes push_back slower.

Here's the full test code:

#include <iostream> #include <iomanip> #include <vector> #include <numeric> #include <chrono> #include <string> #include <cstring>  #define PROFILE(BLOCK, ROUTNAME) ProfilerRun([&](){do {BLOCK;} while(0);}, \         ROUTNAME, __FILE__, __LINE__); //#define PROFILE(BLOCK, ROUTNAME) BLOCK  template <typename T> void ProfilerRun (T&&  func, const std::string& routine_name = "unknown",                   const char* file = "unknown", unsigned line = 0) {     using std::chrono::duration_cast;     using std::chrono::microseconds;     using std::chrono::steady_clock;     using std::cerr;     using std::endl;      steady_clock::time_point t_begin = steady_clock::now();      // Call the function     func();      steady_clock::time_point t_end = steady_clock::now();     cerr << "[" << std::setw (20)          << (std::strrchr (file, '/') ?              std::strrchr (file, '/') + 1 : file)          << ":" << std::setw (5) << line << "]   "          << std::setw (10) << std::setprecision (6) << std::fixed          << static_cast<float> (duration_cast<microseconds>                                 (t_end - t_begin).count()) / 1e6          << "s  --> " << routine_name << endl;      cerr.unsetf (std::ios_base::floatfield); }  using namespace std;  constexpr int N = (1 << 28); constexpr int PAGESIZE = 4096;  uint64_t __attribute__((noinline)) routine1() {     uint64_t sum;     int* bigarray = new int[N];     PROFILE (     {         for (int k = 0, *p = bigarray; p != bigarray+N; ++p, ++k)             *p = k;     }, "new (routine1)");     sum = std::accumulate (bigarray, bigarray + N, 0ULL);     delete [] bigarray;     return sum; }  uint64_t __attribute__((noinline)) routine2() {     uint64_t sum;     int* bigarray = new int[N];      memset(bigarray, 0, sizeof(int)*N);      PROFILE (     {         for (int k = 0, *p = bigarray; p != bigarray+N; ++p, ++k)             *p = k;     }, "new + full memset (routine2)");     sum = std::accumulate (bigarray, bigarray + N, 0ULL);     delete [] bigarray;     return sum; }  uint64_t __attribute__((noinline)) routine3() {     uint64_t sum;     int* bigarray = new int[N];      for(int k = 0; k < N; k += PAGESIZE/2/sizeof(int))         bigarray[k] = 0;      PROFILE (     {         for (int k = 0, *p = bigarray; p != bigarray+N; ++p, ++k)             *p = k;     }, "new + strided memset (every page half) (routine3)");     sum = std::accumulate (bigarray, bigarray + N, 0ULL);     delete [] bigarray;     return sum; }  uint64_t __attribute__((noinline)) routine4() {     uint64_t sum;     int* bigarray = new int[N];      for(int k = 0; k < N; k += PAGESIZE/1/sizeof(int))         bigarray[k] = 0;      PROFILE (     {         for (int k = 0, *p = bigarray; p != bigarray+N; ++p, ++k)             *p = k;     }, "new + strided memset (every page) (routine4)");     sum = std::accumulate (bigarray, bigarray + N, 0ULL);     delete [] bigarray;     return sum; }  uint64_t __attribute__((noinline)) routine5() {     uint64_t sum;     int* bigarray = new int[N];      for(int k = 0; k < N; k += PAGESIZE*2/sizeof(int))         bigarray[k] = 0;      PROFILE (     {         for (int k = 0, *p = bigarray; p != bigarray+N; ++p, ++k)             *p = k;     }, "new + strided memset (every other page) (routine5)");     sum = std::accumulate (bigarray, bigarray + N, 0ULL);     delete [] bigarray;     return sum; }  uint64_t __attribute__((noinline)) routine6() {     uint64_t sum;     int* bigarray = new int[N];      for(int k = 0; k < N; k += PAGESIZE*4/sizeof(int))         bigarray[k] = 0;      PROFILE (     {         for (int k = 0, *p = bigarray; p != bigarray+N; ++p, ++k)             *p = k;     }, "new + strided memset (every 4th page) (routine6)");     sum = std::accumulate (bigarray, bigarray + N, 0ULL);     delete [] bigarray;     return sum; }  uint64_t __attribute__((noinline)) routine7() {     uint64_t sum;     vector<int> bigarray (N);     PROFILE (     {         for (int k = 0; k < N; ++k)             bigarray[k] = k;     }, "vector, using ctor to initialize (routine7)");     sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);     return sum; }  uint64_t __attribute__((noinline)) routine8() {     uint64_t sum;     vector<int> bigarray;     PROFILE (     {         for (int k = 0; k < N; ++k)             bigarray.push_back (k);     }, "vector (+ no reserve) + push_back (routine8)");     sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);     return sum; }  uint64_t __attribute__((noinline)) routine9() {     uint64_t sum;     vector<int> bigarray;     bigarray.reserve (N);     PROFILE (     {         for (int k = 0; k < N; ++k)             bigarray.push_back (k);     }, "vector + reserve + push_back (routine9)");     sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);     return sum; }  uint64_t __attribute__((noinline)) routine10() {     uint64_t sum;     vector<int> bigarray;     bigarray.reserve (N);     memset(bigarray.data(), 0, sizeof(int)*N);     PROFILE (     {         for (int k = 0; k < N; ++k)             bigarray.push_back (k);     }, "vector + reserve + memset (UB) + push_back (routine10)");     sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);     return sum; }  template<class T> void __attribute__((noinline)) adjust_size(std::vector<T>& v, int k, double factor) {     if(k >= v.size())     {         v.resize(v.size() < 10 ? 10 : k*factor);     } }  uint64_t __attribute__((noinline)) routine11() {     uint64_t sum;     vector<int> bigarray;     PROFILE (     {         for (int k = 0; k < N; ++k)         {             adjust_size(bigarray, k, 1.5);             bigarray[k] = k;         }     }, "vector + custom emplace_back @ factor 1.5 (routine11)");     sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);     return sum; }  uint64_t __attribute__((noinline)) routine12() {     uint64_t sum;     vector<int> bigarray;     PROFILE (     {         for (int k = 0; k < N; ++k)         {             adjust_size(bigarray, k, 2);             bigarray[k] = k;         }     }, "vector + custom emplace_back @ factor 2 (routine12)");     sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);     return sum; }  uint64_t __attribute__((noinline)) routine13() {     uint64_t sum;     vector<int> bigarray;     PROFILE (     {         for (int k = 0; k < N; ++k)         {             adjust_size(bigarray, k, 3);             bigarray[k] = k;         }     }, "vector + custom emplace_back @ factor 3 (routine13)");     sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);     return sum; }  uint64_t __attribute__((noinline)) routine14() {     uint64_t sum;     vector<int> bigarray;     PROFILE (     {         for (int k = 0; k < N; ++k)             bigarray.emplace_back (k);     }, "vector (+ no reserve) + emplace_back (routine14)");     sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);     return sum; }  uint64_t __attribute__((noinline)) routine15() {     uint64_t sum;     vector<int> bigarray;     bigarray.reserve (N);     PROFILE (     {         for (int k = 0; k < N; ++k)             bigarray.emplace_back (k);     }, "vector + reserve + emplace_back (routine15)");     sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);     return sum; }  uint64_t __attribute__((noinline)) routine16() {     uint64_t sum;     vector<int> bigarray;     bigarray.reserve (N);     memset(bigarray.data(), 0, sizeof(bigarray[0])*N);     PROFILE (     {         for (int k = 0; k < N; ++k)             bigarray.emplace_back (k);     }, "vector + reserve + memset (UB) + emplace_back (routine16)");     sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);     return sum; }  unsigned x = 0; template<class T> void /*__attribute__((noinline))*/ silly_branch(std::vector<T>& v, int k) {     if(k == x)     {         x = x < 10 ? 10 : x*2;     }     //++x; }  uint64_t __attribute__((noinline)) routine17() {     uint64_t sum;     vector<int> bigarray(N);     PROFILE (     {         for (int k = 0; k < N; ++k)         {             silly_branch(bigarray, k);             bigarray[k] = k;         }     }, "vector, using ctor to initialize + silly branch (routine17)");     sum = std::accumulate (begin (bigarray), end (bigarray), 0ULL);     return sum; }  template<class T, int N> constexpr int get_extent(T(&)[N]) {  return N;  }  int main() {     uint64_t results[] = {routine2(),     routine1(),     routine2(),     routine3(),     routine4(),     routine5(),     routine6(),     routine7(),     routine8(),     routine9(),     routine10(),     routine11(),     routine12(),     routine13(),     routine14(),     routine15(),     routine16(),     routine17()};      std::cout << std::boolalpha;     for(int i = 1; i < get_extent(results); ++i)     {         std::cout << i << ": " << (results[0] == results[i]) << "\n";     }     std::cout << x << "\n"; }

A sample run, on an old & slow computer; note:

N == 2<<28, not 2<<29 as in the OP
compiled with g++4.9 20131022 with -std=c++11 -O3 -march=native

 [            temp.cpp:   71]     0.654927s  --> new + full memset (routine2) [            temp.cpp:   54]     1.042405s  --> new (routine1) [            temp.cpp:   71]     0.605061s  --> new + full memset (routine2) [            temp.cpp:   89]     0.597487s  --> new + strided memset (every page half) (routine3) [            temp.cpp:  107]     0.601271s  --> new + strided memset (every page) (routine4) [            temp.cpp:  125]     0.783610s  --> new + strided memset (every other page) (routine5) [            temp.cpp:  143]     0.903038s  --> new + strided memset (every 4th page) (routine6) [            temp.cpp:  157]     0.602401s  --> vector, using ctor to initialize (routine7) [            temp.cpp:  170]     3.811291s  --> vector (+ no reserve) + push_back (routine8) [            temp.cpp:  184]     2.091391s  --> vector + reserve + push_back (routine9) [            temp.cpp:  199]     1.375837s  --> vector + reserve + memset (UB) + push_back (routine10) [            temp.cpp:  224]     8.738293s  --> vector + custom emplace_back @ factor 1.5 (routine11) [            temp.cpp:  240]     5.513803s  --> vector + custom emplace_back @ factor 2 (routine12) [            temp.cpp:  256]     5.150388s  --> vector + custom emplace_back @ factor 3 (routine13) [            temp.cpp:  269]     3.789820s  --> vector (+ no reserve) + emplace_back (routine14) [            temp.cpp:  283]     2.090259s  --> vector + reserve + emplace_back (routine15) [            temp.cpp:  298]     1.288740s  --> vector + reserve + memset (UB) + emplace_back (routine16) [            temp.cpp:  325]     0.611168s  --> vector, using ctor to initialize + silly branch (routine17) 1: true 2: true 3: true 4: true 5: true 6: true 7: true 8: true 9: true 10: true 11: true 12: true 13: true 14: true 15: true 16: true 17: true 335544320

answered Oct 04 '22 12:10

dyp

Related questions
                            
                                strptime() equivalent on Windows?
                            
                                Why does C++ need the scope resolution operator?
                            
                                Value and size of an uninitialized std::string variable in c++
                            
                                Why isn't it legal to convert "pointer to pointer to non-const" to a "pointer to pointer to const"
                            
                                C++, can I statically initialize a std::map at compile time?
                            
                                Why is 'char' signed by default in C++?
                            
                                Is there a sorted container in the STL?
                            
                                Using a variable with the same name in different spaces
                            
                                C++: Should I use 'typedef' or 'using namespace'? [closed]
                            
                                Preincrement faster than postincrement in C++ - true? If yes, why is it? [duplicate]
                            
                                How to use boost::crc?
                            
                                Evil samples of subtly broken C++ code
                            
                                Embed resources (eg, shader code; images) into executable/library with CMake
                            
                                Can I easily iterate over the values of a map using a range-based for loop?
                            
                                Where is a std::string allocated in memory?
                            
                                How to set the stacksize with C++11 std::thread
                            
                                Acquire/Release versus Sequentially Consistent memory order
                            
                                Compile program for 32bit on 64bit Linux OS causes fatal error
                            
                                Try catch with locks in C++
                            
                                Memory Allocation/Deallocation? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With