Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ Optimize if/else condition

I have a single line of code, that consumes 25% - 30% of the runtime of my application. It is a less-than comparator for an std::set (the set is implemented with a Red-Black-Tree). It is called about 180 Million times within 28 seconds.

struct Entry {
  const float _cost;
  const long _id;

  // some other vars

    Entry(float cost, float id) : _cost(cost), _id(id) {
    } 
};



template<class T>
struct lt_entry: public binary_function <T, T, bool>
{
    bool operator()(const T &l, const T &r) const
    {
        // Most readable shape
        if(l._cost != r._cost) {
            return r._cost < l._cost;
        } else {
            return l._id < r._id;
        }
    }
};

The entries should be sorted by cost and if the cost is the same by their id. I have many insertions for each extraction of the minimum. I thought about using Fibonacci-Heaps, but I have been told that they are theoretically nice, but suffer from high constants and are pretty complicated to implement. And since insert is in O(log(n)) the runtime increase is nearly constant with large n. So I think its okay to stick to the set.

To improve performance I tried to express it in different shapes:

return l._cost < r._cost || r._cost > l._cost || l._id < r._id;

return l._cost < r._cost || (l._cost == r._cost && l._id < r._id);

Even this:

typedef union {
    float _f;
    int _i;
} flint;

//...

flint diff;
diff._f = (l._cost - r._cost);
return (diff._i && diff._i >> 31) || l._id < r._id;

But the compiler seems to be smart enough already, because I haven't been able to improve the runtime.

I also thought about SSE but this problem is really not very applicable for SSE...

The assembly looks somewhat like this:

movss  (%rbx),%xmm1
mov    $0x1,%r8d
movss  0x20(%rdx),%xmm0
ucomiss %xmm1,%xmm0
ja     0x410600 <_ZNSt8_Rb_tree[..]+96>
ucomiss %xmm0,%xmm1
jp     0x4105fd <_ZNSt8_Rb_[..]_+93>
jne    0x4105fd <_ZNSt8_Rb_[..]_+93>
mov    0x28(%rdx),%rax
cmp    %rax,0x8(%rbx)
jb     0x410600 <_ZNSt8_Rb_[..]_+96>
xor    %r8d,%r8d

I have a very tiny bit experience with assembly language, but not really much.

I thought it would be the best (only?) point to squeeze out some performance, but is it really worth the effort? Can you see any shortcuts that could save some cycles?

The platform the code will run on is an ubuntu 12 with gcc 4.6 (-stl=c++0x) on a many-core intel machine. Only libraries available are boost, openmp and tbb. The 30 second benchmark was performed on my 4yr old laptop (core 2 duo).

I am really stuck on this one, it seems so simple, but takes that much time. I have been crunching my head since days thinking how I could improve this line...

Can you give me a suggestion how to improve this part, or is it already at its best?

EDIT 1: After using Jerrys suggestion I achieved a speed up of ~4.5 seconds. EDIT 2: After trying boost Fibonacci heaps the comparison went to 174 Mio calls to the less than function.

like image 310
Heye Avatar asked Dec 12 '12 03:12

Heye


1 Answers

An easy solution is precompute a sort-identifier comprised of the cost as most significant and the id as the rest.

E.g.,

struct Entry
{
    double cost_;
    long id_;
    long long sortingId_;

  // some other vars

    Entry( double cost, float id )
        : cost_( cost ), id_( id ), sortingId_( 1e9*100*cost + id )
    {} 
};

Adjust sortingId_ value based on what you can assume about the value ranges.

Then, now just sort on sortingId_.


Or as variation of the same idea, if you can't make suitable assumptions about the data, then consider preparing data especially for memcmp.


For a higher level solution, remember that std::set::insert has an overload with a hint argument. If your data is already in near sorted order, that might seriously reduce the number of calls to your comparator function.


And you might consider whether a std::unordered_set might be sufficient? I.e. whether you need to list the data in sorted order. Or if the sorting is just the internal stuff of std::set element insertion.


Finally, for other readers (the OP has made clear that he's aware of this), remember to MEASURE.

like image 161
Cheers and hth. - Alf Avatar answered Oct 26 '22 08:10

Cheers and hth. - Alf