Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is std::rotate so fast?

Why is std::rotate so much faster than the equivalent function that cplusplus.com describes?

cplusplus.com's implementation:

template <class ForwardIterator>
  void rotate (ForwardIterator first, ForwardIterator middle, ForwardIterator last)
{
  ForwardIterator next= middle;

  while (first != next)
  {
    swap (*first++, *next++);

    if(next == last)
        next= middle;
    else if (first==middle)
        middle= next;
  }
}

I have two insertion sorting algorithms which are entirely identical, with the exception that one uses std::rotate, and one uses cplusplus.com's equivalent function. I'm setting them to sort 1000 vectors with 1000 int elements. The sort which uses std::rotate takes 0.376 seconds, and the other takes 8.181 seconds.

Why is this? I'm not intending to try and make something better than the STL functions but I'm still curious.

like image 244
bradshire Avatar asked Jan 16 '14 11:01

bradshire


People also ask

How does STD rotate work?

std::rotateRotates the order of the elements in the range [first,last) , in such a way that the element pointed by middle becomes the new first element.

Is std :: find slow?

std::equal_range on bidirectional iterators is extremely slow, because it has to walk step by step through the range. The std::set. find method, on the other hand, uses the tree structure of std::set to find the element really fast. It can, basically, get midpoints of a range really fast.

What is use of rotate in C++?

The rotate() function in C++ is used to rotate the elements' order within a specified range. This is done in such a way that the element pointed by middle now becomes the first element. In other words, the rotation happens at the iterator, pointing to the middle element.


Video Answer


2 Answers

As the commentors already stated, it depends on your Standard Library implementation. But the code that you posted is valid even for forward iterators. As such, it imposes very little requirements (only that these iterators can be incremented and dereferenced).

Stepanov's classic Elements of Programming devotes an entire chapter (10) to rotate and other rearrangement algorithms. For forward iterators, the series of swaps in your code gives O(3N) assignments. For bidirectional iterators, three consecutive calls to reverse yield another O(3N) algorithm. For random access iterators, std::rotate can be implemented as O(N) assignments by defining a permutation of indices w.r.t. to the starting iterator first.

All the above algorithms are in-place. Using a memory buffer, it is possible that the random access version can benefit from the greater cache locality of memcpy() or memmove() (if the underlying value type is POD) in which entire blocks of contiguous memory can be swapped. If your insertion sort is done on an array or std::vector, it is likely that your Standard Library will take advantage of this optimization.

TL;DR: trust your Standard Library and don't reinvent the wheel!

like image 79
TemplateRex Avatar answered Oct 07 '22 12:10

TemplateRex


Edit:

Since the context is not given, it is not clear if your code calls std::swap() or another swap(a,b) algorithm like

T tmp = a; a = b; b = tmp;

When a and b are vectors of 1000 ints each, this would copy all vector elements 3 times. The specialized version of std::swap() for containers like std::vector<T> call the container a.swap(b) method instead, essentially swapping only the dynamic data pointers of the containers.

Also, for different iterator types the std::rotate() implementation can utilize some optimizations (see my older, possibly misleading answer below).


Caveat: The implementation of std::rotate() is implementation-dependent. For different iterator categories, different algorithms can be utilized (e.g. look for __rotate( in bits/stl_algo.h header of GNU g++).

To shift n elements by m=std::distance(first,middle) a simple (naive) algorithm like m rotations by one element needs O(n*m) moving or copying operations. But only O(n) moves are needed, when each element is directly placed to its right position, which results in a (roughly) m times faster algorithm.

An example for illustration: Rotate a string s = "abcdefg" by three elements:

abcdefg : store 'a' in temporary place
dbcdefg : move s[3] to s[0] (where it belongs in the end, directly)
dbcgefg : move s[6] to s[3]
dbcgefc : move s[9%7] to s[6] (wrapping index modulo container size: 9%7 == 2)
dbfgefc : move s[5] to s[2]
dbfgebc : move s[1] to s[5] (another wrapping around)
defgebc : move s[4] to s[1]
defgabc : move 'a' from temporary place to s[4]

For n and m with greatest common divisor 1 you are done now. Otherwise you have to repeat that scheme n/m time for the first m consecutive elements (n > m assumed here). This little more complicated algorithm is much faster.

For bidirectional iterators another legendary O(3n) algorithm can be used, referred to as "flipping hands". According to Jon Bentley's book Programming Pearls it was used in early UNIX editors for moving text:

Place your hands in front of you, one above the other, thumbs up. Now

  1. Turn one hand.
  2. Turn the other.
  3. Turn both, connected to each other.

In code:

reverse(first, middle);
reverse(middle, last);
reverse(first, last);

For random access iterators large chunks of memory can be relocated by swap_ranges() (or memmove() operations for POD types).

Microoptimization by utilising assembler operations can give a small extra amount of acceleration, it can be done on top of the fasted algorithm.

Algorithms using consecutive elements instead of "hopping around" in memory also result in smaller number of cache misses on modern computer architectures.

like image 25
René Richter Avatar answered Oct 07 '22 13:10

René Richter