Using lambda instead of a function object, bad performance

Question

My problem is pretty simple, i want to use lambda's in the same way i may use a functor as a 'comparator', let me explain a little better. I have two big structs, both of them have their own implementation of operator<, and i have also a useless class (this is just the name of the class in the context of this question) which use the two struct, everything looks like this:

struct be_less
{
    //A lot of stuff
    int val;
    be_less(int p_v):val(p_v){}
    bool operator<(const be_less& p_other) const
    {
        return val < p_other.val;
    }
};

struct be_more
{
    //A lot of stuff
    int val;
    be_more(int p_v):val(p_v){}
    bool operator<(const be_more& p_other) const
    {
        return val > p_other.val;
    }
};

class useless
{
    priority_queue<be_less> less_q;
    priority_queue<be_more> more_q;
public:
    useless(const vector<int>& p_data)
    {
        for(auto elem:p_data)
        {
            less_q.emplace(elem);
            more_q.emplace(elem);
        }
    }
};

I whould like to remove the duplication in the two struct's, the simpliest idea is to make the struct a template and provide two functor to do the comparison job:

template<typename Comp>
struct be_all
{
    //Lot of stuff, better do not duplicate
    int val;
    be_all(int p_v):val{p_v}{}
    bool operator<(const be_all<Comp>& p_other) const
    {
        return Comp()(val,p_other.val);
    }
};

class comp_less
{
public:
    bool operator()(int p_first,
                    int p_second)
    {
        return p_first < p_second;
    }
};

class comp_more
{
public:
    bool operator()(int p_first,
                    int p_second)
    {
        return p_first > p_second;
    }
};

typedef be_all<comp_less> all_less;
typedef be_all<comp_more> all_more;

class useless
{
    priority_queue<all_less> less_q;
    priority_queue<all_more> more_q;
public:
    useless(const vector<int>& p_data)
    {
        for(auto elem:p_data)
        {
            less_q.emplace(elem);
            more_q.emplace(elem);
        }
    }
};

This work pretty well, now for sure i dont have any duplication in the struct code at the price of two additional function object. Please note that i'm very simplifying the implementation of operator<, the hipotetic real code does much more than just comparing two ints.

Then i was thinking about how to do the same thing using lambda (Just as an experiment).The only working solution i was able to implement is:

template<typename Comp>
struct be_all
{
    int val;
    function<bool(int,int)> Comparator;
    be_all(Comp p_comp,int p_v):
        Comparator(move(p_comp)),
        val{p_v}
    {}
    bool operator<(const be_all& p_other) const
    {
        return Comparator(val, p_other.val);
    }
};

auto be_less = [](int p_first,
          int p_second)
{
    return p_first < p_second;
};

auto be_more = [](int p_first,
          int p_second)
{
    return p_first > p_second;
};

typedef be_all<decltype(be_less)> all_less;
typedef be_all<decltype(be_more)> all_more;

class useless
{
    priority_queue<all_less> less_q;
    priority_queue<all_more> more_q;
public:
    useless(const vector<int>& p_data)
    {
        for(auto elem:p_data)
        {
            less_q.emplace(be_less,elem);
            more_q.emplace(be_more,elem);
        }
    }
};

This implementation not only add a new member to the data containing struct, but have also a very poor performance, i prepared a small test in which i create one instance for all the useless class i've show you here, every time i feed the constructor with a vector full of 2 milion integers, the results are the following:

Takes 48ms to execute the constructor of the first useless class
Takes 228ms to create the second useless class (functor)
Takes 557ms to create the third useless class (lambdas)

Clearly the price i pay for the removed duplication is very high, and in the original code the duplication is still there. Please note how bad is the performance of the third implementation, ten times slower that the original one, i believed that the reason of the third implementation being slower than the second was because of the additional parameter in the constructor of be_all... but:

Actually there's also a fourth case, where i still used the lambda but i get rid of the Comparator member and of the additional parameter in be_all, the code is the following:

template<typename Comp>
struct be_all
{
    int val;
    be_all(int p_v):val{p_v}
    {}
    bool operator<(const be_all& p_other) const
    {
        return Comp(val, p_other.val);
    }
};

bool be_less = [](int p_first,
          int p_second)
{
    return p_first < p_second;
};

bool be_more = [](int p_first,
          int p_second)
{
    return p_first > p_second;
};

typedef be_all<decltype(be_less)> all_less;
typedef be_all<decltype(be_more)> all_more;

class useless
{
    priority_queue<all_less> less_q;
    priority_queue<all_more> more_q;
public:
    useless(const vector<int>& p_data)
    {
        for(auto elem:p_data)
        {
            less_q.emplace(elem);
            more_q.emplace(elem);
        }
    }
};

If i remove auto from the lambda and use bool instead the code build even if i use Comp(val, p_other.val) in operator<.

What's very strange to me is that this fourth implementation (lambda without the Comparator member) is even slower than the other, at the end the average performance i was able to register are the following:

48ms
228ms
557ms
698ms

Why the functor are so much faster than lambdas in this scenario? I was expecting lambda's to be at least performing good as the ordinary functor, can someone of you comment please? And is there any technial reason why the fourth implementation is slower than the third?

PS:

The compilator i'm using is g++4.8.2 with -O3. In my test i create for each useless class an instance and using chrono i take account of the required time:

namespace benchmark
{
    template<typename T>
    long run()
    {
        auto start=chrono::high_resolution_clock::now();
        T t(data::plenty_of_data);
        auto stop=chrono::high_resolution_clock::now();
        return chrono::duration_cast<chrono::milliseconds>(stop-start).count();
    }
}

and:

cout<<"Bad code:  "<<benchmark::run<bad_code::useless>()<<"ms
";
cout<<"Bad code2: "<<benchmark::run<bad_code2::useless>()<<"ms
";
cout<<"Bad code3: "<<benchmark::run<bad_code3::useless>()<<"ms
";
cout<<"Bad code4: "<<benchmark::run<bad_code4::useless>()<<"ms
";

The set of input integers is the same for all, plenty_of_data is a vector full of 2 million intergers.

Thanks for your time

Pradhan · Accepted Answer

You are not comparing the runtime of a lambda and a functor. Instead, the numbers indicate the difference in using a functor and an std::function. And std::function<R(Args...)>, for example, can store any Callable satisfying the signature R(Args...). It does this through type-erasure. So, the difference you see comes from the overhead of a virtual call in std::function::operator().

For example, the libc++ implementation(3.5) has a base class template<class _Fp, class _Alloc, class _Rp, class ..._ArgTypes> __base with a virtual operator(). std::function stores a __base<...>*. Whenever you create an std::function with a callable F, an object of type template<class F, class _Alloc, class R, class ...Args> class __func is created, which inherits from __base<...> and overrides the virtual operator().

Using lambda instead of a function object, bad performance

Tags:

c++

c++11

lambda

fjanisze

1 Answers

Pradhan

Recent Activity

Donate For Us

Using lambda instead of a function object, bad performance

Tags:

c++

c++11

lambda

fjanisze

1 Answers

Pradhan

Related questions

Recent Activity

Donate For Us