I am doing some tests measuring the performance of standard containers under various conditions, and I came across something odd. When I am inserting many items into the middle of a std::vector
, if I first call reserve with the exact number of elements that I will be adding, I see essentially no performance difference in most circumstances compared with not calling reserve, which is surprising. More surprising, however, is that if I call reserve with the exact number of elements I need + 1
, then I get a significant performance improvement. This is a sample table of results that I just got (all time are in seconds):
+---------------+--------+-------------------+-----------------------+ | # of elements | vector | vector (reserved) | vector (reserved + 1) | +---------------+--------+-------------------+-----------------------+ | 10000 | 0.04 | 0.04 | 0.03 | | 20000 | 0.14 | 0.14 | 0.11 | | 30000 | 0.32 | 0.32 | 0.25 | | 40000 | 0.55 | 0.55 | 0.44 | | 50000 | 0.87 | 0.85 | 0.66 | | 60000 | 1.24 | 1.24 | 0.96 | | 70000 | 1.69 | 1.68 | 1.31 | | 80000 | 2.17 | 2.21 | 1.71 | | 90000 | 2.78 | 2.75 | 2.16 | | 100000 | 3.43 | 3.44 | 2.68 | | 110000 | 4.13 | 4.15 | 3.23 | | 120000 | 4.88 | 4.89 | 3.86 | | 130000 | 5.79 | 5.8 | 4.51 | | 140000 | 6.71 | 6.71 | 5.24 | | 150000 | 7.7 | 7.7 | 6.02 | | 160000 | 8.76 | 8.67 | 6.86 | | 170000 | 9.9 | 9.91 | 7.74 | | 180000 | 11.07 | 10.98 | 8.64 | | 190000 | 12.34 | 12.35 | 9.64 | | 200000 | 13.64 | 13.56 | 10.72 | | 210000 | 15.1 | 15.04 | 11.67 | | 220000 | 16.59 | 16.41 | 12.89 | | 230000 | 18.05 | 18.06 | 14.13 | | 240000 | 19.64 | 19.74 | 15.36 | | 250000 | 21.34 | 21.17 | 16.66 | | 260000 | 23.08 | 23.06 | 18.02 | | 270000 | 24.87 | 24.89 | 19.42 | | 280000 | 26.5 | 26.58 | 20.9 | | 290000 | 28.51 | 28.69 | 22.4 | | 300000 | 30.69 | 30.74 | 23.97 | | 310000 | 32.73 | 32.81 | 25.57 | | 320000 | 34.63 | 34.99 | 27.28 | | 330000 | 37.12 | 37.17 | 28.99 | | 340000 | 39.36 | 39.43 | 30.83 | | 350000 | 41.7 | 41.48 | 32.45 | | 360000 | 44.11 | 44.22 | 34.55 | | 370000 | 46.62 | 46.71 | 36.22 | | 380000 | 49.09 | 48.91 | 38.46 | | 390000 | 51.71 | 51.98 | 40.22 | | 400000 | 54.45 | 54.56 | 43.03 | | 410000 | 57.23 | 57.29 | 44.84 | | 420000 | 60 | 59.73 | 46.67 | | 430000 | 62.9 | 63.03 | 49.3 | +---------------+--------+-------------------+-----------------------+
I checked the implementation, and it doesn't appear to have an off-by-one error. I then further tested by printing the size and the capacity immediately after calling reserve, and then I printed them again after filling up the vector, and everything looks good.
before: size: 0 capacity: 10000 after: size: 10000 capacity: 10000 before: size: 0 capacity: 20000 after: size: 20000 capacity: 20000 ...
Compiler is gcc 4.7.2 on Fedora Linux x86_64. Compiler options are -std=c++11 -Ofast -march=native -funsafe-loop-optimizations -flto=4 - fwhole-program
The code is below.
#include <algorithm> #include <array> #include <cstdint> #include <vector> #include <random> #include <string> #include <iostream> #include <fstream> #include <boost/timer.hpp> namespace { constexpr size_t array_size = 1; unsigned number() { static std::random_device rd; static std::mt19937 random_engine(rd()); static std::uniform_int_distribution<uint32_t> distribution(0, std::numeric_limits<uint32_t>::max()); return distribution(random_engine); } class Class { public: Class() { x[0] = number(); } std::string to_string() const { return std::to_string(x[0]); } inline friend bool operator<=(Class const & lhs, Class const & rhs) { return lhs.x[0] <= rhs.x[0]; } private: std::array<uint32_t, array_size> x; }; template<typename Container> void add(Container & container, Class const & value) { auto const it = std::find_if(std::begin(container), std::end(container), [&](Class const & c) { return value <= c; }); container.emplace(it, value); } // Do something with the result template<typename Container> void insert_to_file(Container const & container) { std::fstream file("file.txt"); for (auto const & value : container) { file << value.to_string() << '\n'; } } template<typename Container> void f(std::vector<Class> const & values) { Container container; container.reserve(values.size()); for (auto const & value : values) { add(container, value); } insert_to_file(container); } } int main(int argc, char ** argv) { std::size_t const size = (argc == 1) ? 1 : std::stoul(argv[1]); // Default constructor of Class fills in values here std::vector<Class> const values_to_be_copied(size); typedef std::vector<Class> Container; boost::timer timer; f<Container>(values_to_be_copied); std::cerr << "Finished in " << timer.elapsed() << " seconds.\n"; }
I created a C++03 version to try and help other people reproduce it, but I cannot reproduce it in this version, despite trying to make it show the problem by making it as direct of a translation as possible:
#include <algorithm> #include <cstdlib> #include <fstream> #include <vector> #include <string> #include <iostream> #include <boost/array.hpp> #include <boost/cstdint.hpp> #include <boost/lexical_cast.hpp> #include <boost/random/mersenne_twister.hpp> #include <boost/random/uniform_int_distribution.hpp> #include <boost/timer.hpp> namespace { unsigned number() { static boost::random::mt19937 random_engine; static boost::random::uniform_int_distribution<boost::uint32_t> distribution(0, std::numeric_limits<boost::uint32_t>::max()); return distribution(random_engine); } class Class { public: Class() { x[0] = number(); } inline friend bool operator<=(Class const & lhs, Class const & rhs) { return lhs.x[0] <= rhs.x[0]; } std::string to_string() const { return boost::lexical_cast<std::string>(x[0]); } private: boost::array<boost::uint32_t, 1> x; }; class Less { public: Less(Class const & c): value(c) { } bool operator()(Class const & c) const { return value <= c; } private: Class value; }; void add(std::vector<Class> & container, Class const & value) { std::vector<Class>::iterator it = std::find_if(container.begin(), container.end(), Less(value)); container.insert(it, value); } // Do something with the result void insert_to_file(std::vector<Class> const & container) { std::fstream file("file.txt"); for (std::vector<Class>::const_iterator it = container.begin(); it != container.end(); ++it) { file << it->to_string() << '\n'; } } void f(std::vector<Class> const & values) { std::vector<Class> container; container.reserve(values.size() + 1); for (std::vector<Class>::const_iterator it = values.begin(); it != values.end(); ++it) { add(container, *it); } insert_to_file(container); } } int main(int argc, char ** argv) { std::size_t const size = (argc == 1) ? 1 : boost::lexical_cast<std::size_t>(argv[1]); // Default constructor of Class fills in values here std::vector<Class> const values_to_be_copied(size); boost::timer timer; f(values_to_be_copied); std::cerr << "Finished in " << timer.elapsed() << " seconds.\n"; }
The line that currently calls reserve was changed to include a + 1
or was completely removed, depending on which test I was running. The entire thing was run from a shell script that started at 10000 elements and increased up to 430000 elements, going one version at a time.
My processor is an Intel i5 4-core processor, and I have 4 GiB of memory. I will try and simplify the C++11 version of the code as much as possible to see if I can isolate the problem.
Does anyone know why reserving one more element than I need is causing this increase in speed?
vector::reserve() vector reserve() indicates that the vector is created such that it can store at least the number of the specified elements without having to reallocate memory. Begin Declare a variable v of vector type.
vector::reserveIncrease the capacity of the vector (the total number of elements that the vector can hold without requiring reallocation) to a value that's greater or equal to new_cap .
C++ Vector Library - reserve() FunctionThe C++ function std::vector::reserve() requests to reserve vector capacity be at least enough to contain n elements. Reallocation happens if there is need of more space.
vector::reserve does allocate memory, so your question about reserving memory without allocating is incorrect. The point is that reserving memory can be done without changing the vectors size. Basically a vector has two sizes, it's size and it's capacity.
I made the following modification to the program:
size_t a = getenv("A") ? 1 : 0; void f(std::vector<Class> const & values) { ... container.reserve(values.size() + a); ... }
Now the performance is same (fast) regardless if a is 0 or 1. The conclusion must be that the reservation of an extra item has no performance impact (which was assumed in the question). Also other small changes to the source code or compiler flags toggles the performance between fast and slow, so it looks like the code optimizer just has better luck in some cases than in others.
The following implementation of f() triggers the opposite behaviour with same compiler flags, thus it is fast when exact size is reserved and slow when an extra item is reserved:
template<typename Container> void f(std::vector<Class> const & values) { Container container; container.reserve(values.size()); for (auto it = values.begin(); it != values.end(); ++it) { add(container, *it); } insert_to_file(container); }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With