I have some data structures: <ul> <li> <code>all_unordered_m</code> is a big vector containing all the strings I need (all different)</li> <li> <code>ordered_m</code> is a small vector containing the indexes of a subset of the strings (all different) in the former vector</li> <li> <code>position_m</code> maps the indexes of objects from the first vector to their position in the second one.</li> </ul> The <code>string_after(index, reverse)</code> method returns the string referenced by ordered_m after <code>all_unordered_m[index]</code>. <code>ordered_m</code> is considered circular, and is explored in natural or reverse order depending on the second parameter. The code is something like the following: <pre class="prettyprint"><code>struct ordered_subset { // [...] std::vector<std::string>& all_unordered_m; // size = n >> 1 std::vector<size_t> ordered_m; // size << n std::tr1::unordered_map<size_t, size_t> position_m; const std::string& string_after(size_t index, bool reverse) const { size_t pos = position_m.find(index)->second; if(reverse) pos = (pos == 0 ? orderd_m.size() - 1 : pos - 1); else pos = (pos == ordered.size() - 1 ? 0 : pos + 1); return all_unordered_m[ordered_m[pos]]; } }; </code></pre> Given that: <ul> <li>I do need all of the data-structures for other purposes;</li> <li>I cannot change them because I need to access the strings: <ul> <li>by their id in the all_unordered_m;</li> <li>by their index inside the various ordered_m; </li> </ul> </li> <li>I need to know the position of a string (identified by it's position in the first vector) inside ordered_m vector;</li> <li>I cannot change the string_after interface without changing most of the program.</li> </ul> How can I speed up the <code>string_after</code> method that is called billions of times and is eating up about 10% of the execution time? EDIT: I've tried making <code>position_m</code> a <code>vector</code> instead of a <code>unordered_map</code> and using the following method to avoid jumps: <pre class="prettyprint"><code>string_after(size_t index, int direction) const { return all_unordered_m[ordered_m[ (ordered_m.size()+position_m[index]+direction)%ordered_m.size()]]; } </code></pre> The change in position_m seems to be the most effective (I'm not sure that eliminating the branches made any difference, I'm tempted to say that the code is more compact but equally efficient with that regard).

Well, in such cases (a small function that is called often) every branch can be very expensive. There are two things that come to mind. <ol> <li>Could you leave out the <code>reverse</code> parameter and make it two separate methods? This only makes sense if that doesn't simply push the <code>if</code>-statement to the calling code.</li> <li>Try the following for calculating <code>pos</code>: <code>pos = (pos + 1) % ordered_m.size()</code> (this is for the forward case). This only works if you are sure that <code>pos</code> never overflows when incrementing it.</li> </ol> In general, try to replace branches with arithmetic operations in such cases, this can give you substantial speedup.

How to speed-up a simple method (preferably without changing interfaces or data structures)?

Tags:

c++

performance

optimization

I have some data structures:

all_unordered_m is a big vector containing all the strings I need (all different)
ordered_m is a small vector containing the indexes of a subset of the strings (all different) in the former vector
position_m maps the indexes of objects from the first vector to their position in the second one.

The string_after(index, reverse) method returns the string referenced by ordered_m after all_unordered_m[index].

ordered_m is considered circular, and is explored in natural or reverse order depending on the second parameter.

The code is something like the following:

struct ordered_subset {
    // [...]

    std::vector<std::string>& all_unordered_m; // size = n >> 1
    std::vector<size_t> ordered_m;             // size << n
    std::tr1::unordered_map<size_t, size_t> position_m;  

    const std::string&
    string_after(size_t index, bool reverse) const
    {
        size_t pos = position_m.find(index)->second;
        if(reverse)
            pos = (pos == 0 ? orderd_m.size() - 1 : pos - 1);
        else
            pos = (pos == ordered.size() - 1 ? 0 : pos + 1);
        return all_unordered_m[ordered_m[pos]];
    }
};

Given that:

I do need all of the data-structures for other purposes;
I cannot change them because I need to access the strings:
- by their id in the all_unordered_m;
- by their index inside the various ordered_m;
I need to know the position of a string (identified by it's position in the first vector) inside ordered_m vector;
I cannot change the string_after interface without changing most of the program.

How can I speed up the string_after method that is called billions of times and is eating up about 10% of the execution time?

EDIT: I've tried making position_m a vector instead of a unordered_map and using the following method to avoid jumps:

string_after(size_t index, int direction) const
{
  return all_unordered_m[ordered_m[
      (ordered_m.size()+position_m[index]+direction)%ordered_m.size()]];
}

The change in position_m seems to be the most effective (I'm not sure that eliminating the branches made any difference, I'm tempted to say that the code is more compact but equally efficient with that regard).

587

asked Apr 04 '10 15:04

baol

2 Answers

vector lookups are blazing fast. size() calls and simple arithmetic are blazing fast. map lookups, in comparison, are as slow as a dead turtle with a block of concrete on his back. I have often seen those become a bottleneck in otherwise simple code like this.

You could try unordered_map from TR1 or C++0x (a drop-in hashtable replacement of map) instead and see if that makes a difference.

103

answered Sep 20 '22 15:09

Thomas

Well, in such cases (a small function that is called often) every branch can be very expensive. There are two things that come to mind.

Could you leave out the reverse parameter and make it two separate methods? This only makes sense if that doesn't simply push the if-statement to the calling code.
Try the following for calculating pos: pos = (pos + 1) % ordered_m.size() (this is for the forward case). This only works if you are sure that pos never overflows when incrementing it.

In general, try to replace branches with arithmetic operations in such cases, this can give you substantial speedup.

answered Sep 18 '22 15:09

Björn Pollex

Related questions
                            
                                Disallowing overriding virtual method returning const reference with a method returning non-const reference
                            
                                error: unable to handle compilation, expected exactly one compiler job in '' [clang-diagnostic-error]
                            
                                Is modifying the internal bytes of a const object undefined behavior in case it contains another object constructed by placement new?
                            
                                Why is default noexcept move constructor being accepted?
                            
                                What is the approach to implement operator[] / iterator to deal with performance issue?
                            
                                Why can a `constexpr` function produce different results at compile- and run-time?
                            
                                Must aggregate field constructor be public to use aggregate initialization in C++?
                            
                                C++17 PMR:: Set number of blocks and their size in a unsynchronized_pool_resource
                            
                                How do I make a GUI?
                            
                                g++ partial linking instead of archives?
                            
                                Microsoft _stprintf warning
                            
                                How do I convert from _TCHAR * to char * when using C++ variable-length args?
                            
                                efficient TIFF tile extraction C++
                            
                                Multi-monitor 3D Application
                            
                                A terminal-like window for wxWidgets?
                            
                                Why doesn't boost::serialization check for tag names in XML archives?
                            
                                How do I embed Mercurial tags into my C/C++ code?
                            
                                Debugged Program Window Won't Close
                            
                                DLL and fully-specialized template class
                            
                                VS2010 RC - only 100 std::map elements in debugger

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With