Why are C++ STL vectors 1000x slower when doing many reserves?

Tags:

I've run into a strange situation.

In my program I have a loop that combines a bunch of data together in a giant vector. I was trying to figure out why it was running so slowly, even though it seemed like I was doing everything right to allocate memory in an efficient manner on the go.

In my program it is difficult to determine how big the final vector of combined data should be, but the size of each piece of data is known as it is processed. So instead of reserving and resizing the combined data vector in one go, I was reserving enough space for each data chunk as it is added to the larger vector. That's when I ran into this issue that is repeatable using the simple snippet below:

std::vector<float> arr1;
std::vector<float> arr2;
std::vector<float> arr3;
std::vector<float> arr4;
int numLoops = 10000;
int numSubloops = 50;

{
    // Test 1
    // Naive test where no pre-allocation occurs

    for (int q = 0; q < numLoops; q++)
    {
        for (int g = 0; g < numSubloops; g++)
        {
            arr1.push_back(q * g);
        }
    }
}

{
    // Test 2
    // Ideal situation where total amount of data is reserved beforehand

    arr2.reserve(numLoops * numSubloops);
    for (int q = 0; q < numLoops; q++)
    {
        for (int g = 0; g < numSubloops; g++)
        {
            arr2.push_back(q * g);
        }
    }
}

{
    // Test 3
    // Total data is not known beforehand, so allocations made for each
    // data chunk as they are processed using 'resize' method

    int arrInx = 0;
    for (int q = 0; q < numLoops; q++)
    {
        arr3.resize(arr3.size() + numSubloops);
        for (int g = 0; g < numSubloops; g++)
        {
            arr3[arrInx++] = q * g;
        }
    }
}

{
    // Test 4
    // Total data is not known beforehand, so allocations are made for each
    // data chunk as they are processed using the 'reserve' method

    for (int q = 0; q < numLoops; q++)
    {
        arr4.reserve(arr4.size() + numSubloops);
        for (int g = 0; g < numSubloops; g++)
        {
            arr4.push_back(q * g);
        }
    }
}

The results of this test, after compilation in Visual Studio 2017, are as follows:

Test 1: 7 ms
Test 2: 3 ms
Test 3: 4 ms
Test 4: 4000 ms

Why is there the huge discrepancy in running times?

Why does calling reserve a bunch of times, followed by push_back take 1000x times longer than calling resize a bunch of times, followed by direct index access?

How does it make any sense that it could take 500x longer than the naive approach which includes no pre-allocations at all?

646

asked Jan 31 '18 06:01

Tyson

2 Answers

How does it make any sense that it could take 500x longer than the naive approach which includes no pre-allocations at all?

That's where you're mistaken. The 'naive' approach you speak of does do pre-allocations. They're just done behind the scenes, and infrequently, in the call to push_back. It doesn't just allocate room for one more element every time you call push_back. It allocates some amount that is a factor (usually between 1.5x and 2x) of the current capacity. And then it doesn't need to allocate again until that capacity runs out. This is much more efficient than your loop which does an allocation every time 50 elements are added, with no regard for the current capacity.

answered Sep 30 '22 18:09

Benjamin Lindley

@Benjamin Lindley's answer explains the capacity of std::vector. However, for exactly why the 4th test case is that slow, in fact it's an implementation detail of the standard library.

[vector.capacity]

void reserve(size_type n);

...

Effects: A directive that informs a vector of a planned change in size, so that it can manage the storage allocation accordingly. After reserve(), capacity() is greater or equal to the argument of reserve if reallocation happens; and equal to the previous value of capacity() otherwise. Reallocation happens at this point if and only if the current capacity is less than the argument of reserve().

Thus it is not guaranteed by C++ standard that after reserve() for a larger capacity, the actual capacity should be the requested one. Personally I think it's not unreasonable for an implementation to follow some specific policy when such larger capacity request is received. However, I also tested on my machine, it seems the STL just does the simplest thing.

answered Sep 30 '22 18:09

llllllllll

Related questions
                            
                                Forward Declaration of class in C++, incomplete type
                            
                                Pass additional arguments to remove_if
                            
                                Linking g++ 4.8 to libstdc++
                            
                                Linking to multiple .obj for unit testing a console application
                            
                                do "const" declarations help the compiler (GCC) produce faster code? [duplicate]
                            
                                In C++, what categories (lvalue, rvalue, xvalue, etc.) can expressions that produce temporaries of class type fall into?
                            
                                Why does qmake add -O1 and -O2 optimization flags in this case?
                            
                                How to install Modular Boost?
                            
                                Run button is disabled in Qt Creator
                            
                                MenuBar Not Showing for Simple QMainWindow Code, Qt Creator Mac OS
                            
                                Catching exception from worker thread in the main thread
                            
                                error: ‘list’ is not a member of ‘std’ and error: template argument 2 is invalid
                            
                                Why can't you use C++11 brace initialization with macros?
                            
                                compiling code with opencv - /usr/bin/ld: cannot find -lippicv
                            
                                Mutually exclusive functions calling each other
                            
                                Is there a weak_ptr equivalent to shared_from_this?
                            
                                unique_ptr in member initialization list
                            
                                What happens to the return value if I don't store it anywhere?
                            
                                Does C copy the element every time you access an array?
                            
                                No matching function for call std::forward(const std::string &) with variadic arguments

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why are C++ STL vectors 1000x slower when doing many reserves?

Tags:

c++

stl

vector

resize

Tyson

People also ask

2 Answers

Benjamin Lindley

llllllllll

Recent Activity

Donate For Us