Is the poor performance of std::vector due to not calling realloc a logarithmic number of times?

Tags:

EDIT: I added two more benchmarks, to compare the use of realloc with the C array and of reserve() with the std::vector. From the last analysis it seems that realloc influences a lot, even if called only 30 times. Checking the documentation I guess this is due to the fact that realloc can return a completely new pointer, copying the old one. To complete the scenario I also added the code and graph for allocating completely the array during the initialisation. The difference from reserve() is tangible.

Compile flags: only the optimisation described in the graph, compiling with g++ and nothing more.

Original question:

I made a benchmark of std::vector vs a new/delete array, when I add 1 billion integers and the second code is dramatically faster than the one using the vector, especially with the optimisation turned on.

I suspect that this is caused by the vector internally calling realloc too many times. This would be the case if vector does not grows doubling its size every time it gets filled (here the number 2 has nothing special, what matters is that its size grows geometrically). In such a case the calls to realloc would be only O(log n) instead of O(n).

If this is what causes the slowness of the first code, how can I tell std::vector to grow geometrically?

Note that calling reserve once would work in this case but not in the more general case in which the number of push_back is not known in advance.

enter image description here

black line

#include<vector>

int main(int argc, char * argv[]) {
    const unsigned long long size = 1000000000;

    std::vector <int> b(size);
    for(int i = 0; i < size; i++) {
        b[i]=i;
    }    
    return 0;
}

blue line

#include<vector>

int main(int argc, char * argv[]) {
    const int size = 1000000000;    
    std::vector <int> b;
    for(int i = 0; i < size; i++) {
        b.push_back(i);
    }    

    return 0;
}

green line

#include<vector>

int main(int argc, char * argv[]) {
    const int size = 1000000000;
    std::vector <int> b;
    b.reserve(size);
    for(int i = 0; i < size; i++) {
        b.push_back(i);
    }    

    return 0;
}

red line

int main(int argc, char * argv[]) {
    const int size = 1000000000;
    int * a = new int [size];
    for(int i = 0; i < size; i++) {
        a[i] = i;
    }
    delete [] a;   
    return 0;
}

orange line

#include<vector>

int main(int argc, char * argv[]) {
    const unsigned long long size = 1000000000;
    int * a = (int *)malloc(size*sizeof(int));
    int next_power = 1;
    for(int i = 0; i < size; i++) {
        a[i] = i;
        if(i == next_power - 1) {
            next_power *= 2;
            a=(int*)realloc(a,next_power*sizeof(int));
        }
    }
    free(a);
    return 0;
}

enter image description here

EDIT: checking .capacity(), as suggested, we saw that the growth is indeed exponential. So why the vector is so slow?

850

asked Apr 02 '18 16:04

Nisba

3 Answers

The optimized C style array is optimized to nothing.

On godbolt:

xorl %eax, %eax
retq

that is the program.

Whenever you have a program optimized to nearly 0s you should consider this possibility.

The optimizer sees you are doing nothing with the memory allocated, notes that unused allocating memory may have zero side effects, and eliminates the allocation.

And writing to memory then never reading it also has zero side effects.

In comparison, the compiler has difficulty proving that the vector's allocation is useless. Probably the compiler developers could teach it to recognize unused std vectors like they recognize unused raw C arrays, but that optimization really is a corner case, and it causes lots of problems profiling in my experience.

Note that the vector-with-reserve at any optimization level is basically the same speed as the unoptimized C style version.

In the C style code, the only thing to optimize is "don't do anything". In the vector code, the unoptimized version is full of extra stack frames and debug checks to ensure you don't go out of bounds (and crash cleanly if you do).

Note that on a Linux system, allocating huge chunks of memory doesn't do anything except fiddle with the virtual memory table. Only when the memory is touched does it actually find some zero'd physical memory for you.

Without reserve, the std vector has to guess an initial small size, resize it an copy it, and repeat. This causes a 50% performance loss, which seems reasonable to me.

With reserve, it actually does the work. The work takes just under 5s.

Adding to vector via push back does causes it to grow geometrically. Geometric grows results in an asymptotic average of 2-3 copies of each piece of data being made.

As for realloc, std::vector does not realloc. It allocates a new buffer, and copies the old data, then discards the old one.

Realloc attempts to grow the buffer, and if it cannot it bitwise copies the buffer.

That is more efficient than std vector can manage for bitwise copyable types. I'd bet the realloc version actually never copies; there is always memory space to grow the vector into (in a real program this may not be the case).

The lack of realloc in std library allocators is a minor flaw. You'd have to invent a new API for it, because you'd want it to work for non-bitwise copy (something like "try grow allocated memory", which if it fails leaves it up to you to grow the allocation).

176

answered Sep 22 '22 03:09

Yakk - Adam Nevraumont

when I add 1 billion integers and the second code is dramatically faster than the one using the vector

That's... completely unsurprising. One of your cases involves a dynamically sized container that has to readjust for its load, and the other involves a fixed size container that doesn't. The latter simply has to do way less work, no branching, no additional allocations. The fair comparison would be:

std::vector<int> b(size);
for(int i = 0; i < size; i++) {
    b[i] = i;
}

This now does the same thing as your array example (well, almost - new int[size] default-initializes all the ints whereas std::vector<int>(size) zero-initializes them, so it's still more work).

It doesn't really make sense to compare these two to each other. If the fixed-size int array fits your use case, then use it. If it doesn't, then don't. You either need a dynamically sized container or not. If you do, performing slower than a fixed-size solution is something you're implicitly giving up.

If this is what causes the slowness of the first code, how can I tell std::vector to grow geometrically?

std::vector is already mandated to grow geometrically already, it's the only way to maintain O(1) amortized push_back complexity.

answered Sep 21 '22 03:09

Barry

Is the poor performance of std::vector due to not calling realloc a logarithmic number of times?

~~Your test neither supports that conclusion, nor does it prove the opposite. However, I would assume that reallocation is called linear number of times unless there is contrary evidence.~~

Update: Your new test is apparently evidence against your non-logarithmic reallocation hypothesis.

I suspect that this is caused by the vector internally calling realloc too many times.

Update: Your new test shows that some of the difference is due to reallocations... but not all. I suspect that the remainder is due to the fact that optimizer was able to prove (but only in the case of the non-growing) that the array values are unused, and chose to not loop and write them at all. If you were to make sure that the written values are actually used, then I would expect that the non-growing array would have similar optimized performance to the reserving vector.

The difference (between reserving code and non-reserving vector) in optimized build is most likely due to doing more reallocations (compared to no reallocations of the reserved array). Whether the number of reallocations is too much is situational and subjective. The downside of doing fewer reallocations is more wasted space due to overallocation.

Note that the cost of reallocation of large arrays comes primarily from copying of elements, rather than memory allocation itself.

In unoptimized build, there is probably additional linear overhead due to function calls that weren't expanded inline.

how can I tell std::vector to grow geometrically?

Geometric growth is required by the standard. There is no way and no need to tell std::vector to use geometric growth.

Note that calling reserve once would work in this case but not in the more general case in which the number of push_back is not known in advance.

However, a general case in which the number of push_back is not known in advance is a case where the non-growing array isn't even an option and so its performance is irrelevant for that general case.

answered Sep 20 '22 03:09

eerorika

Related questions
                            
                                sizeof(value) vs sizeof(type)?
                            
                                What if an object passed into std::swap throws an exception during swapping?
                            
                                Assigning namespaces in C++
                            
                                What's the difference between LONG float and double in C++?
                            
                                C++ static_cast from int* to void* to char* - can you help me to understand this code?
                            
                                Initializing a variable by a lambda expression?
                            
                                Overloading less than operator
                            
                                Qt/c++ random string generation [duplicate]
                            
                                Is there a short way to write for i in range(100) in C++?
                            
                                C++ Get Total File Line Number
                            
                                Converting _TCHAR* to char*
                            
                                Why do I get the error C1033: cannot open program database in VS 2010
                            
                                Using various format specifiers of c in c++
                            
                                Can we print the address of object of class in C++?
                            
                                Merge two Mat images into one
                            
                                Is it possible to replace a method at runtime?
                            
                                Why can't I use auto with std::thread?
                            
                                Create video using ffmpeg
                            
                                What does *(int*)(buffer) mean?
                            
                                Print comma separated list from std::vector [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is the poor performance of std::vector due to not calling realloc a logarithmic number of times?

Tags:

c++

performance

benchmarking

vector

realloc

Nisba

People also ask

3 Answers

Yakk - Adam Nevraumont

Barry

eerorika

Recent Activity

Donate For Us