Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do I get a constant instead of logarithmic curve for an insert time benchmark of the RB-tree based C++ std::set?

Tags:

c++

stl

I was comparing BST to Heap at: Heap vs Binary Search Tree (BST) but when I tried to benchmark both and compare results, I could not interpret the data for BST.

First, I confirmed that the standard library does use a Red-black tree: What is the underlying data structure of a STL set in C++?

Then I ran this benchmark.

main.cpp

#include <chrono>
#include <iostream>
#include <random>
#include <set>

int main(int argc, char **argv) {
    size_t i, n;
    std::set<int> bst;
    std::random_device dev;
    unsigned int seed = dev();
    std::mt19937 prng(seed);
    std::uniform_int_distribution<> dist;

    if (argc > 1) {
        n = std::stoi(argv[1]);
    } else {
        n = 1000000;
    }
    for (i = 0; i < n; ++i) {
        auto random_value = dist(prng);
        auto start = std::chrono::steady_clock::now();
        bst.insert(random_value);
        auto end = std::chrono::steady_clock::now();
        auto dt_bst = end - start;
        std::cout << random_value << " "
            << std::chrono::duration_cast<std::chrono::nanoseconds>(dt_bst).count() << std::endl;
    }
}

plot.gnuplot:

#!/usr/bin/env gnuplot
set terminal png size 1024, 1024
set output "bst_vs_heap.png"
set title "BST insert time"
set xlabel "size"
set ylabel "nanoseconds"
plot "main.dat" using 2 notitle

Compile, run and plot:

g++ -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic-errors -o main.out main.cpp
./main.out 10000000 > main.dat
./plot.gnuplot

Outcome:

enter image description here

Why don't I see a nice logarithmic curve, as in the theoretical data structure, but rather a somewhat constant line with some outliers?

Ubuntu 18.04, GCC 7.3, Intel i7-7820HQ CPU, DDR4 2400 MHz RAM, Lenovo Thinkpad P51.


1 Answers

The clock is likely not accurate enough as mentioned in the comments, so I've tried to group a bunch of inserts together and time them to improve the signal to noise, and it worked, I can see a logarithm now:

#include <chrono>
#include <iostream>
#include <random>
#include <set>

int main(int argc, char **argv) {
    size_t i, j, n, granule;
    std::set<int> bst;
    std::random_device dev;
    unsigned int seed = dev();
    std::mt19937 prng(seed);
    std::uniform_int_distribution<> dist;
    int *randoms;

    if (argc > 1) {
        n = std::stoi(argv[1]);
    } else {
        n = 1000000;
    }
    if (argc > 2) {
        granule = std::stoi(argv[2]);
    } else {
        granule = 10;
    }
    randoms = new int[granule];
    for (i = 0; i < n / granule; ++i) {
        for (j = 0; j < granule; ++j) {
            randoms[j] = dist(prng);
        }
        auto start = std::chrono::high_resolution_clock::now();
        for (j = 0; j < granule; ++j) {
            bst.insert(randoms[j]);
        }
        auto end = std::chrono::high_resolution_clock::now();
        auto dt_bst = end - start;
        std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(dt_bst).count() << std::endl;
    }
    delete[] randoms;
}

Command:

./main.out 100000000 10000

Graph:

enter image description here