In my project, I needed to insert two elements into two indices. I was implementing an alternative implementation instead of vector insert since two insert calls shift vector elements twice and I can do the same with a single shift. However, the alternative is far slower. What could be the explanation of this behavior?
#include <vector>
#include <chrono>
#include <iostream>
void insert2(std::vector<int>& items, size_t first, size_t last, int item = -1) {
// assert(last < items.size() + 2);
// assert(first < last);
// assert(0 <= first);
// Creating two temporary objects
// items.reserve(std::max(items.capacity(), items.size() + 2));
items.emplace_back(); items.emplace_back();
// Moving elements from the back to last
for(auto p = items.end() - 1, q = items.begin() + last; p != q; --p) {
// *p = std::move(*(p - 2));
*p = *(p - 2);
}
// Emplace at last
// new(&items[last]) ...
items[last] = item;
// Moving elements from last to first
for(auto p = items.begin() + last - 1, q = items.begin() + first; p != q; --p) {
// *p = std::move(*(p - 1));
*p = *(p - 1);
}
// Emplace at first
// new(&items[first]) ...
items[first] = item;
}
auto now() {
return std::chrono::steady_clock::now();
}
int main() {
const size_t N = 100;
const size_t M = 100;
auto begin = now();
begin = now();
for(size_t n = 0; n < N; n++) { // run the same N times
for(size_t i = 0; i < M + 1; i++) {
for(size_t j = i + 1; j < M + 2; j++) {
std::vector<int> v(M);
insert2(v, i, j);
}
}
}
std::cout << "insert2 " << std::chrono::duration_cast<std::chrono::nanoseconds>(now() - begin).count() / (1000.0 * N) << "us\n";
begin = now();
for(size_t n = 0; n < N; n++) { // run the same N times
for(size_t i = 0; i < M + 1; i++) {
for(size_t j = i + 1; j < M + 2; j++) {
std::vector<int> v(M);
v.insert(v.begin() + i, -1);
v.insert(v.begin() + j, -1);
}
}
}
std::cout << "insert1 " << std::chrono::duration_cast<std::chrono::nanoseconds>(now() - begin).count() / (1000.0 * N) << "us\n";
}
My Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz outputs with O0
insert2 7941.29us
insert1 4005.15us
With O3,
insert2 763.64us
insert1 688.365us
Live demo on quick-bench
I used a benchmark created by @MarekR and modified it such that no (re)allocations happen inside the benchmark loop, see http://quick-bench.com/UX9aEcrP06xBe51qKX3LjZWMU38. Then I did only a single double-insertion to the 1/3 and 2/3 of the vector size. With a vector of 100 integer elements (constant N
), the custom version is actually slower, however, for 1000 elements it's already faster. And, for 1M elements, the custom version is almost exactly 1.5 times faster, which corresponds with the number of spared elements "moves". With std::vector::insert
, you need to move N
elements, while with a custom version only N * 2 / 3
.
To be honest, I still don't know why the custom version is slower for small vectors. Anyway, I think you might be interested in this answer as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With