I created the following class to understand the behavior of std::sort
:
class X {
public:
X(int i) : i_(i) { }
X(X&& rhs) noexcept : i_(std::move(rhs.i_)) { mc_++; }
X& operator=(X&& rhs) noexcept {
i_ = std::move(rhs.i_); ao_++; return *this;
}
void swap(X& rhs) noexcept { std::swap(i_, rhs.i_); sw_++; }
friend bool operator<(const X& lhs, const X& rhs) {
return lhs.i_ < rhs.i_;
}
static void reset() { mc_ = ao_ = sw_ = 0; }
private:
int i_;
static size_t mc_, ao_, sw_; // function-call counters
};
// void swap(X& lhs, X& rhs) { lhs.swap(rhs); }
And run the following benchmark code:
int main() {
std::vector<X> v;
for (int i = 0; i < 1000000; i++) v.emplace_back(i);
std::mt19937 g(0xa41bc9); // fixed seed to compare measurements
std::shuffle(v.begin(), v.end(), g);
X::reset();
std::sort(std::begin(v), std::end(v));
}
The whole code in online IDE is here: https://wandbox.org/permlink/nbwRKptakgCSHK4f.
The measured number of calls of particular functions are as follows (all with -O2
or /O2
flags):
function: move ctor operator= swap
GCC 7.1.0: 5,007,335 11,700,048 0
clang 4.0.0: 4,932,061 9,973,899 0
MSVC 19.11: 8,580,356 21,521,211 0
If I uncomment the swap
function, situation gets better:
function: move ctor operator= swap
GCC 7.1.0: 999,999 3,685,376 4,007,336
clang 4.0.0: 72,554 254,885 4,859,507
MSVC 19.11: 906,593 6,173,685 7,673,763
However, there still remains a lot of calls of the move constructor (plus destructor) and the move assignment operator. What bothers me is the efficiency. For instance, calls of swap
and operator=
can be inlined by the compiler, but I guess the compiler may not "inline" (optimize out) the construction/destruction of objects.
Why is construction/assignment of objects used in sorting? In-place sorting (which usually std::sort
does) can be implemented purely by compare and swap operations.
UPDATE
My assumption was wrong. It seems perfectly legal to optimize out object creation/destruction, such as in:
X temp = std::move(x1);
x1 = std::move(x2);
x2 = std::move(temp);
Therefore, such code can be as efficient as custom swap
. Online example: https://godbolt.org/g/ud4u9U - there are no calls of move constructor / assignment operator, though these are not trivial, and their functionality is inlined into main
.
std::sort()
is a hybrid algorithm. While a vanilla quicksort could get away with only swap()
operations (from std::partition()
) the real sorting approach most likely uses insertion, heap, and/or merge sort, too. For these sorting algorithms it is generally more effective to lift an object out of the way (by move construction), moving objects into the current "hole", and finally moving the out of place ibject out of the way. It may be reasonable to keep just one temporary object but most likely the algorithm uses some functions and keeping on temporary ibject around is somewhat impractical (although something I hadn't contemplated before and possibly worth trying).
Earlier this year my Quicker Sorting talk got recorded at the Italian C++ Conference: it goes over the details of making quicksort quick.
The upshot is: if you want to sort objects you better make sure that copy/move construction, copy/move assignment, destructor, and swap()
are fast. I can imagine that keeping a temporary object could lessen the need for construction and destruction but assignments will remain. A dedicated destructive move could possibly improve performance, too, but I haven't experimented with that (yet).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With