Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I which situation will std::map<A,B> be faster than sorted std::vector<std::pair<A,B>>?

I was using a map in some code to store ordered data. I found out that for huge maps, destruction could take a while. In this code I had, replacing map by vector<pair> reduced processing time by 10000...

Finally, I was so surprised that I decided to compare map performances with sorted vector or pair.

And I'm surprised because I could not find a situation where map was faster than a sorted vector of pair (filled randomly and later sorted)...there must be some situations where map is faster....else what's the point in providing this class?

Here is what I tested:

Test one, compare map filling and destroying vs vector filling, sorting (because I want a sorted container) and destroying:

#include <iostream>
#include <time.h>
#include <cstdlib>
#include <map>
#include <vector>
#include <algorithm>

int main(void)
{

    clock_t tStart = clock();

    {
        std::map<float,int> myMap;
        for ( int i = 0; i != 10000000; ++i )
        {
            myMap[ ((float)std::rand()) / RAND_MAX ] = i;
        }
    }

    std::cout << "Time taken by map: " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl;

    tStart = clock();

    {
        std::vector< std::pair<float,int> > myVect;
        for ( int i = 0; i != 10000000; ++i )
        {
            myVect.push_back( std::make_pair( ((float)std::rand()) / RAND_MAX, i ) );
        }

        // sort the vector, as we want a sorted container:
        std::sort( myVect.begin(), myVect.end() );
    }

    std::cout << "Time taken by vect: " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl;

    return 0;
}

Compiled with g++ main.cpp -O3 -o main and got:

Time taken by map: 21.7142
Time taken by vect: 7.94725

map's 3 times slower...

Then, I said, "OK, vector is faster to fill and sort, but search will be faster with the map"....so I tested:

#include <iostream>
#include <time.h>
#include <cstdlib>
#include <map>
#include <vector>
#include <algorithm>

int main(void)
{
    clock_t tStart = clock();

    {
        std::map<float,int> myMap;
        float middle = 0;
        float last;
        for ( int i = 0; i != 10000000; ++i )
        {
            last = ((float)std::rand()) / RAND_MAX;
            myMap[ last ] = i;
            if ( i == 5000000 )
                middle = last; // element we will later search
        }

        std::cout << "Map created after " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl;

        float sum = 0;
        for ( int i = 0; i != 10; ++i )
            sum += myMap[ last ]; // search it

        std::cout << "Sum is " << sum << std::endl;
    }

    std::cout << "Time taken by map: " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl;

    tStart = clock();

    {
        std::vector< std::pair<float,int> > myVect;
        std::pair<float,int> middle;
        std::pair<float,int> last;
        for ( int i = 0; i != 10000000; ++i )
        {
            last = std::make_pair( ((float)std::rand()) / RAND_MAX, i );
            myVect.push_back( last );
            if ( i == 5000000 )
                middle = last; // element we will later search
        }

        std::sort( myVect.begin(), myVect.end() );

        std::cout << "Vector created after " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl;

        float sum = 0;
        for ( int i = 0; i != 10; ++i )
            sum += (std::find( myVect.begin(), myVect.end(), last ))->second; // search it

        std::cout << "Sum is " << sum << std::endl;
    }

    std::cout << "Time taken by vect: " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl;

    return 0;
}

Compiled with g++ main.cpp -O3 -o main and got:

Map created after 19.5357
Sum is 1e+08
Time taken by map: 21.41
Vector created after 7.96388
Sum is 1e+08
Time taken by vect: 8.31741

Even search is apparently faster with the vector (10 searchs with the map took almost 2sec and it took only half a second with the vector)....

So:

  • Did I miss something?
  • Is my tests not correct/accurate?
  • Is map simply a class to avoid or is there really situations where map offers good performances?
like image 848
jpo38 Avatar asked Feb 12 '16 20:02

jpo38


2 Answers

Generally a map will be better when you're doing a lot of insertions and deletions interspersed with your lookups. If you build the data structure once and then only do lookups, a sorted vector will almost certainly be faster, if only because of processor cache effects. Since insertions and deletions at arbitrary locations in a vector are O(n) instead of O(log n), there will come a point where those will become the limiting factor.

like image 133
Mark Ransom Avatar answered Nov 08 '22 06:11

Mark Ransom


std::find has linear time complexity whereas a map search has log N complexity.

When you find that one algorithm is 100000x faster than the other you should get suspicious! Your benchmark is invalid.

You need to compare realistic variants. Probably, you meant to compare map with a binary search. Run each of those variants for at least 1 second of CPU time so that you can realistically compare the results.

When a benchmark returns "0.00001 seconds" time spent you are well in the clock inaccuracy noise. This number means nothing.

like image 43
usr Avatar answered Nov 08 '22 06:11

usr