I've run into a strange situation.
In my program I have a loop that combines a bunch of data together in a giant vector. I was trying to figure out why it was running so slowly, even though it seemed like I was doing everything right to allocate memory in an efficient manner on the go.
In my program it is difficult to determine how big the final vector of combined data should be, but the size of each piece of data is known as it is processed. So instead of reserving and resizing the combined data vector in one go, I was reserving enough space for each data chunk as it is added to the larger vector. That's when I ran into this issue that is repeatable using the simple snippet below:
std::vector<float> arr1;
std::vector<float> arr2;
std::vector<float> arr3;
std::vector<float> arr4;
int numLoops = 10000;
int numSubloops = 50;
{
// Test 1
// Naive test where no pre-allocation occurs
for (int q = 0; q < numLoops; q++)
{
for (int g = 0; g < numSubloops; g++)
{
arr1.push_back(q * g);
}
}
}
{
// Test 2
// Ideal situation where total amount of data is reserved beforehand
arr2.reserve(numLoops * numSubloops);
for (int q = 0; q < numLoops; q++)
{
for (int g = 0; g < numSubloops; g++)
{
arr2.push_back(q * g);
}
}
}
{
// Test 3
// Total data is not known beforehand, so allocations made for each
// data chunk as they are processed using 'resize' method
int arrInx = 0;
for (int q = 0; q < numLoops; q++)
{
arr3.resize(arr3.size() + numSubloops);
for (int g = 0; g < numSubloops; g++)
{
arr3[arrInx++] = q * g;
}
}
}
{
// Test 4
// Total data is not known beforehand, so allocations are made for each
// data chunk as they are processed using the 'reserve' method
for (int q = 0; q < numLoops; q++)
{
arr4.reserve(arr4.size() + numSubloops);
for (int g = 0; g < numSubloops; g++)
{
arr4.push_back(q * g);
}
}
}
The results of this test, after compilation in Visual Studio 2017, are as follows:
Test 1: 7 ms
Test 2: 3 ms
Test 3: 4 ms
Test 4: 4000 ms
Why is there the huge discrepancy in running times?
Why does calling reserve
a bunch of times, followed by push_back
take 1000x times longer than calling resize
a bunch of times, followed by direct index access?
How does it make any sense that it could take 500x longer than the naive approach which includes no pre-allocations at all?
There is a myth that for run-time speed, one should use arrays. A std::vector can never be faster than an array, as it has (a pointer to the first element of) an array as one of its data members. But the difference in run-time speed is slim and absent in any non-trivial program.
reserve() does not change the size of the vector.
In short, push_back is doing more than what operator[] is doing - which is why it is slower (and more accurate).
The time difference is 0.333 seconds. Or a difference of 0.000000000333 per array access. Assuming a 2.33 GHz Processor like mine that's 0.7 instruction pipeline stages per array accesses. So the vector looks like it is using one extra instruction per accesses.
How does it make any sense that it could take 500x longer than the naive approach which includes no pre-allocations at all?
That's where you're mistaken. The 'naive' approach you speak of does do pre-allocations. They're just done behind the scenes, and infrequently, in the call to push_back
. It doesn't just allocate room for one more element every time you call push_back
. It allocates some amount that is a factor (usually between 1.5x and 2x) of the current capacity. And then it doesn't need to allocate again until that capacity runs out. This is much more efficient than your loop which does an allocation every time 50 elements are added, with no regard for the current capacity.
@Benjamin Lindley's answer explains the capacity of std::vector
. However, for exactly why the 4th test case is that slow, in fact it's an implementation detail of the standard library.
[vector.capacity]
void reserve(size_type n);
...
Effects: A directive that informs a vector of a planned change in size, so that it can manage the storage allocation accordingly. After reserve(), capacity() is greater or equal to the argument of reserve if reallocation happens; and equal to the previous value of capacity() otherwise. Reallocation happens at this point if and only if the current capacity is less than the argument of reserve().
Thus it is not guaranteed by C++ standard that after reserve()
for a larger capacity, the actual capacity should be the requested one. Personally I think it's not unreasonable for an implementation to follow some specific policy when such larger capacity request is received. However, I also tested on my machine, it seems the STL just does the simplest thing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With