Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ std::vector inserting two elements alternative algorithm fails

In my project, I needed to insert two elements into two indices. I was implementing an alternative implementation instead of vector insert since two insert calls shift vector elements twice and I can do the same with a single shift. However, the alternative is far slower. What could be the explanation of this behavior?

#include <vector>
#include <chrono>
#include <iostream>

void insert2(std::vector<int>& items, size_t first, size_t last, int item = -1) {
    // assert(last < items.size() + 2);
    // assert(first < last);
    // assert(0 <= first);
    // Creating two temporary objects
    // items.reserve(std::max(items.capacity(), items.size() + 2));
    items.emplace_back(); items.emplace_back();

    // Moving elements from the back to last
    for(auto p = items.end() - 1, q = items.begin() + last; p != q; --p) {
        // *p = std::move(*(p - 2));
        *p = *(p - 2);
    }
    // Emplace at last
    // new(&items[last]) ...
    items[last] = item;
    // Moving elements from last to first
    for(auto p = items.begin() + last - 1, q = items.begin() + first; p != q; --p) {
        // *p = std::move(*(p - 1));
        *p = *(p - 1);
    }
    // Emplace at first
    // new(&items[first]) ...
    items[first] = item;
}

auto now() {
    return std::chrono::steady_clock::now();
}

int main() {
    const size_t N = 100;
    const size_t M = 100;
    auto begin = now();

    begin = now();
    for(size_t n = 0; n < N; n++) { // run the same N times
        for(size_t i = 0; i < M + 1; i++) {
            for(size_t j = i + 1; j < M + 2; j++) {
                std::vector<int> v(M);
                insert2(v, i, j);
            }
        }
    }
    std::cout << "insert2 " << std::chrono::duration_cast<std::chrono::nanoseconds>(now() - begin).count() / (1000.0 * N) << "us\n";

    begin = now();
    for(size_t n = 0; n < N; n++) { // run the same N times 
        for(size_t i = 0; i < M + 1; i++) {
            for(size_t j = i + 1; j < M + 2; j++) {
                std::vector<int> v(M);
                v.insert(v.begin() + i, -1);
                v.insert(v.begin() + j, -1);
            }
        }
    }
    std::cout << "insert1 " << std::chrono::duration_cast<std::chrono::nanoseconds>(now() - begin).count() / (1000.0 * N) << "us\n";
}

My Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz outputs with O0

insert2 7941.29us
insert1 4005.15us

With O3,

insert2 763.64us
insert1 688.365us

Live demo on quick-bench

like image 566
Isura Manchanayake Avatar asked Jul 25 '19 09:07

Isura Manchanayake


1 Answers

I used a benchmark created by @MarekR and modified it such that no (re)allocations happen inside the benchmark loop, see http://quick-bench.com/UX9aEcrP06xBe51qKX3LjZWMU38. Then I did only a single double-insertion to the 1/3 and 2/3 of the vector size. With a vector of 100 integer elements (constant N), the custom version is actually slower, however, for 1000 elements it's already faster. And, for 1M elements, the custom version is almost exactly 1.5 times faster, which corresponds with the number of spared elements "moves". With std::vector::insert, you need to move N elements, while with a custom version only N * 2 / 3.

To be honest, I still don't know why the custom version is slower for small vectors. Anyway, I think you might be interested in this answer as well.

like image 144
Daniel Langr Avatar answered Oct 13 '22 07:10

Daniel Langr