Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comprehensive vector vs linked list benchmark for randomized insertions/deletions

So I am aware of this question, and others on SO that deal with issue, but most of those deal with the complexities of the data structures (just to copy here, linked this theoretically has O(

I understand the complexities would seem to indicate that a list would be better, but I am more concerned with the real world performance.

Note: This question was inspired by slides 45 and 46 of Bjarne Stroustrup's presentation at Going Native 2012 where he talks about how processor caching and locality of reference really help with vectors, but not at all (or enough) with lists.

Question: Is there a good way to test this using CPU time as opposed to wall time, and getting a decent way of "randomly" inserting and deleting elements that can be done beforehand so it does not influence the timings?

As a bonus, it would be nice to be able to apply this to two arbitrary data structures (say vector and hash maps or something like that) to find the "real world performance" on some hardware.

like image 286
soandos Avatar asked Mar 19 '12 02:03

soandos


People also ask

What is better vector or linked list?

A linked list has a more complex data structure than a vector; each of its elements consists of the data itself and then one or more pointers. A pointer is a variable that contains a memory address.

Is linked list faster than vector for inserting elements at the front?

For example, taking a bunch of random integers and inserting them in sorted order into a vector or a linked list -- the vector will always be faster, regardless of the number of items total, due to cache misses when searching for the insertion point in the linked list.

Why is linked list better than an array or vector?

linkedlist is faster in add and remove, but slower in get. based on the complexity table and testing results, we can figure out when to use arraylist or linkedlist. in brief, linkedlist should be preferred if: there are no large number of random access of element.

Which is faster vector or list?

std::vector is insanely faster than std::list to find an element. std::vector always performs faster than std::list with very small data. std::vector is always faster to push elements at the back than std::list. std::list handles large elements very well, especially for sorting or inserting in the front.


2 Answers

I guess if I were going to test something like this, I'd probably start with code something on this order:

#include <list>
#include <vector>
#include <algorithm>
#include <deque>
#include <time.h>
#include <iostream>
#include <iterator>

static const int size = 30000;

template <class T>
double insert(T &container) {
    srand(1234);
    clock_t start = clock();
    for (int i=0; i<size; ++i) {
        int value = rand();
        T::iterator pos = std::lower_bound(container.begin(), container.end(), value);
        container.insert(pos, value);
    }
// uncomment the following to verify correct insertion (in a small container).
//  std::copy(container.begin(), container.end(), std::ostream_iterator<int>(std::cout, "\t"));
    return double(clock()-start)/CLOCKS_PER_SEC;
}


template <class T>
double del(T &container) {
    srand(1234);
    clock_t start = clock();
    for (int i=0; i<size/2; ++i) {
        int value = rand();
        T::iterator pos = std::lower_bound(container.begin(), container.end(), value);
        container.erase(pos);
    }
    return double(clock()-start)/CLOCKS_PER_SEC;
}       

int main() { 
    std::list<int> l;
    std::vector<int> v;
    std::deque<int> d;

    std::cout << "Insertion time for list: " << insert(l) << "\n";
    std::cout << "Insertion time for vector: " << insert(v) << "\n";
    std::cout << "Insertion time for deque: " << insert(d) << "\n\n";

    std::cout << "Deletion time for list: " << del(l) << '\n';
    std::cout << "Deletion time for vector: " << del(v) << '\n';
    std::cout << "Deletion time for deque: " << del(d) << '\n';

    return 0;
}

Since it uses clock, this should give processor time not wall time (though some compilers such as MS VC++ get that wrong). It doesn't try to measure the time for insertion exclusive of time to find the insertion point, since 1) that would take a bit more work and 2) I still can't figure out what it would accomplish. It's certainly not 100% rigorous, but given the disparity I see from it, I'd be a bit surprised to see a significant difference from more careful testing. For example, with MS VC++, I get:

Insertion time for list: 6.598
Insertion time for vector: 1.377
Insertion time for deque: 1.484

Deletion time for list: 6.348
Deletion time for vector: 0.114
Deletion time for deque: 0.82

With gcc I get:

Insertion time for list: 5.272
Insertion time for vector: 0.125
Insertion time for deque: 0.125

Deletion time for list: 4.259
Deletion time for vector: 0.109
Deletion time for deque: 0.109

Factoring out the search time would be somewhat non-trivial because you'd have to time each iteration separately. You'd need something more precise than clock (usually is) to produce meaningful results from that (more on the order or reading a clock cycle register). Feel free to modify for that if you see fit -- as I mentioned above, I lack motivation because I can't see how it's a sensible thing to do.

like image 143
Jerry Coffin Avatar answered Sep 30 '22 08:09

Jerry Coffin


This is the program I wrote after watching that talk. I tried running each timing test in a separate process to make sure the allocators weren't doing anything sneaky to alter performance. I have amended the test allow timing of the random number generation. If you are concerned it is affecting the results significantly, you can time it and subtract out the time spent there from the rest of the timings. But I get zero time spent there for anything but very large N. I used getrusage() which I am pretty sure isn't portable to Windows but it would be easy to substitute in something using clock() or whatever you like.

#include <assert.h>
#include <algorithm>
#include <iostream>
#include <list>
#include <vector>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <sys/resource.h>


void f(size_t const N)
{
    std::vector<int> c;
    //c.reserve(N);
    for (size_t i = 0; i < N; ++i) {
        int r = rand();
        auto p = std::find_if(c.begin(), c.end(), [=](int a) { return a >= r; });
        c.insert(p, r);
    }
}

void g(size_t const N)
{
    std::list<int> c;
    for (size_t i = 0; i < N; ++i) {
        int r = rand();
        auto p = std::find_if(c.begin(), c.end(), [=](int a) { return a >= r; });
        c.insert(p, r);
    }
}

int h(size_t const N)
{
    int r;
    for (size_t i = 0; i < N; ++i) {
        r = rand();
    }
    return r;
}

double usage()
{
    struct rusage u;
    if (getrusage(RUSAGE_SELF, &u) == -1) std::abort();
    return
        double(u.ru_utime.tv_sec) + (u.ru_utime.tv_usec / 1e6) +
        double(u.ru_stime.tv_sec) + (u.ru_stime.tv_usec / 1e6);
}


int
main(int argc, char* argv[])
{
    assert(argc >= 3);
    std::string const sel = argv[1];
    size_t const N = atoi(argv[2]);

    double t0, t1;
    srand(127);

    if (sel == "vector") {
        t0 = usage();
        f(N);
        t1 = usage();
    } else if (sel == "list") {
        t0 = usage();
        g(N);
        t1 = usage();
    } else if (sel == "rand") {
        t0 = usage();
        h(N);
        t1 = usage();
    } else {
        std::abort();
    }

    std::cout
        << (t1 - t0)
        << std::endl;

    return 0;
}

To get a set of results I used the following shell script.

seq=`perl -e 'for ($i = 10; $i < 100000; $i *= 1.1) { print int($i), " "; }'`
for i in $seq; do
    vt=`./a.out vector $i`
    lt=`./a.out list $i`
    echo $i $vt $lt
done
like image 31
Bowie Owens Avatar answered Sep 30 '22 08:09

Bowie Owens