I was benchmarking some STL algorithms, and I was surprised by the time taken by the following code: (I measured the g++ compiled code [no optimizations] with the time
command)
#include <vector>
struct vec2{
int x, y;
vec2():x(0), y(0) {}
};
int main(int argc, char* argv[]){
const int size = 200000000;
std::vector<vec2> tab(size); //2.26s
// vec2* tab = new vec2[size]; //1.29s
// tab[0].x = 0;
// delete[] tab;
return 0;
}
The time taken by a vector initialization is 2.26s while a new
(and delete
) takes 1.29s. What is the vector ctor doing that would take so much longer? new[]
calls the constructor on every element, just as the vector
ctor would, right?
I then compiled with -O3, it went all faster, but there was still a gap between the two codes. (I got respectively 0.83s and 0.75s)
Any ideas?
A std::vector can never be faster than an array, as it has (a pointer to the first element of) an array as one of its data members. But the difference in run-time speed is slim and absent in any non-trivial program. One reason for this myth to persist, are examples that compare raw arrays with mis-used std::vectors.
The main reason why push_back is slow is because of multiple reallocation of memory. Every vector has vector::size and vector::capacity. vector::size gives the number of elements in the vector and vector::capacity gives the number of elements vector can store.
The time difference is 0.333 seconds. Or a difference of 0.000000000333 per array access. Assuming a 2.33 GHz Processor like mine that's 0.7 instruction pipeline stages per array accesses. So the vector looks like it is using one extra instruction per accesses.
Firstly, finding an item in a very small vector can easily be faster than the same thing in a map, because all the memory in a vector is always contiguous (and so plays more nicely with computers' caches and such things), and the number of comparisons needed to find something in a vector might be about the same as for ...
The speed will depend on implementation, but most likely reason for the vector being slower is that vector cannot default-construct its elements. Vector elements are always copy-constructed. For example
std::vector<vec2> tab(size);
is in reality interpreted as
std::vector<vec2> tab(size, vec2());
i.e. the second argument gets its value from default argument. The vector then allocates raw memory and copies this default-constructed element passed from the outside into every element of the new vector (by using copy-constructor). This could be generally slower than default-constructing each element directly (as new[]
does).
To illustrate the difference with a code sketch, new vec2[size]
is roughly equivalent to
vec2 *v = (vec2 *) malloc(size * sizeof(vec2));
for (size_t i = 0; i < size; ++i)
// Default-construct `v[i]` in place
new (&v[i]) vec2();
return v;
while vector<vec2>(size)
is roughly equivalent to
vec2 source; // Default-constructed "original" element
vec2 *v = (vec2 *) malloc(size * sizeof(vec2));
for (size_t i = 0; i < size; ++i)
// Copy-construct `v[i]` in place
new (&v[i]) vec2(source);
return v;
Depending on the implementation, the second approach might turn out slower.
The two-times difference in speed is hard to justify though, but benchmarking non-optimized code makes no sense either. The much less significant difference you observed with optimized code is exactly what one might reasonably expect in this case.
Both versions initialize the memory.
As several people have pointed out, the vector uses copy construction while the array uses the default constructor. Your compiler appears to optimize the latter better than the former.
Note that in Real Life, you rarely want to initialize such a huge array in one fell swoop. (What use are a bunch of zeroes? Obviously you intend to put something else in there eventually... And initializing hundreds of megabytes is very cache-unfriendly.)
Instead, you would write something like:
const int size = 200000000;
std::vector<vec2> v;
v.reserve(size);
Then when you are ready to put a real element into the vector, you use v.push_back(element)
. The reserve()
allocates memory without initializing it; the push_back()
copy-constructs into the reserved space.
Alternatively, when you want to put a new element into the vector, you can use v.resize(v.size()+1)
and then modify the element v.back()
. (This is how a "pool allocator" might work.) Although this sequence will initialize the element and then overwrite it, it will all happen in the L1 cache, which is almost as fast as not initializing it at all.
So for a fair comparison, try a large vector (with reserve
) vs. an array for creating a sequence of non-identical items. You should find the vector is faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With