C++ std::vector inserting two elements alternative algorithm fails

Question

In my project, I needed to insert two elements into two indices. I was implementing an alternative implementation instead of vector insert since two insert calls shift vector elements twice and I can do the same with a single shift. However, the alternative is far slower. What could be the explanation of this behavior?

#include <vector>
#include <chrono>
#include <iostream>

void insert2(std::vector<int>& items, size_t first, size_t last, int item = -1) {
    // assert(last < items.size() + 2);
    // assert(first < last);
    // assert(0 <= first);
    // Creating two temporary objects
    // items.reserve(std::max(items.capacity(), items.size() + 2));
    items.emplace_back(); items.emplace_back();

    // Moving elements from the back to last
    for(auto p = items.end() - 1, q = items.begin() + last; p != q; --p) {
        // *p = std::move(*(p - 2));
        *p = *(p - 2);
    }
    // Emplace at last
    // new(&items[last]) ...
    items[last] = item;
    // Moving elements from last to first
    for(auto p = items.begin() + last - 1, q = items.begin() + first; p != q; --p) {
        // *p = std::move(*(p - 1));
        *p = *(p - 1);
    }
    // Emplace at first
    // new(&items[first]) ...
    items[first] = item;
}

auto now() {
    return std::chrono::steady_clock::now();
}

int main() {
    const size_t N = 100;
    const size_t M = 100;
    auto begin = now();

    begin = now();
    for(size_t n = 0; n < N; n++) { // run the same N times
        for(size_t i = 0; i < M + 1; i++) {
            for(size_t j = i + 1; j < M + 2; j++) {
                std::vector<int> v(M);
                insert2(v, i, j);
            }
        }
    }
    std::cout << "insert2 " << std::chrono::duration_cast<std::chrono::nanoseconds>(now() - begin).count() / (1000.0 * N) << "us
";

    begin = now();
    for(size_t n = 0; n < N; n++) { // run the same N times 
        for(size_t i = 0; i < M + 1; i++) {
            for(size_t j = i + 1; j < M + 2; j++) {
                std::vector<int> v(M);
                v.insert(v.begin() + i, -1);
                v.insert(v.begin() + j, -1);
            }
        }
    }
    std::cout << "insert1 " << std::chrono::duration_cast<std::chrono::nanoseconds>(now() - begin).count() / (1000.0 * N) << "us
";
}

My Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz outputs with O0

insert2 7941.29us
insert1 4005.15us

With O3,

insert2 763.64us
insert1 688.365us

Live demo on quick-bench

Daniel Langr · Accepted Answer

I used a benchmark created by @MarekR and modified it such that no (re)allocations happen inside the benchmark loop, see http://quick-bench.com/UX9aEcrP06xBe51qKX3LjZWMU38. Then I did only a single double-insertion to the 1/3 and 2/3 of the vector size. With a vector of 100 integer elements (constant N), the custom version is actually slower, however, for 1000 elements it's already faster. And, for 1M elements, the custom version is almost exactly 1.5 times faster, which corresponds with the number of spared elements "moves". With std::vector::insert, you need to move N elements, while with a custom version only N * 2 / 3.

To be honest, I still don't know why the custom version is slower for small vectors. Anyway, I think you might be interested in this answer as well.

C++ std::vector inserting two elements alternative algorithm fails

Tags:

c++

performance

memory-management

stl

stdvector

Isura Manchanayake

1 Answers

Daniel Langr

Recent Activity

Donate For Us

C++ std::vector inserting two elements alternative algorithm fails

Tags:

c++

performance

memory-management

stl

stdvector

Isura Manchanayake

1 Answers

Daniel Langr

Related questions

Recent Activity

Donate For Us