I've been writing C++11 code for quite some time now, and haven't done any benchmarking of it, only expecting things like vector operations to "just be faster" now with move semantics. So when actually benchmarking with GCC 4.7.2 and clang 3.0 (default compilers on Ubuntu 12.10 64-bit) I get very unsatisfying results. This is my test code:
EDIT: With regards to the (good) answers posted by @DeadMG and @ronag, I changed the element type from std::string
to my::string
which does not have a swap()
, and made all inner strings larger (200-700 bytes) so that they shouldn't be the victims of SSO.
EDIT2: COW was the reason. Adapted code again by the great comments, changed the storage from std::string
to std::vector<char>
and leaving out copy/move onstructors (letting the compiler generate them instead). Without COW, the speed difference is actually huge.
EDIT3: Re-added the previous solution when compiled with -DCOW
. This makes the internal storage a std::string
rather than a std::vector<char>
as requested by @chico.
#include <string> #include <vector> #include <fstream> #include <iostream> #include <algorithm> #include <functional> static std::size_t dec = 0; namespace my { class string { public: string( ) { } #ifdef COW string( const std::string& ref ) : str( ref ), val( dec % 2 ? - ++dec : ++dec ) { #else string( const std::string& ref ) : val( dec % 2 ? - ++dec : ++dec ) { str.resize( ref.size( ) ); std::copy( ref.begin( ), ref.end( ), str.begin( ) ); #endif } bool operator<( const string& other ) const { return val < other.val; } private: #ifdef COW std::string str; #else std::vector< char > str; #endif std::size_t val; }; } template< typename T > void dup_vector( T& vec ) { T v = vec; for ( typename T::iterator i = v.begin( ); i != v.end( ); ++i ) #ifdef CPP11 vec.push_back( std::move( *i ) ); #else vec.push_back( *i ); #endif } int main( ) { std::ifstream file; file.open( "/etc/passwd" ); std::vector< my::string > lines; while ( ! file.eof( ) ) { std::string s; std::getline( file, s ); lines.push_back( s + s + s + s + s + s + s + s + s ); } while ( lines.size( ) < ( 1000 * 1000 ) ) dup_vector( lines ); std::cout << lines.size( ) << " elements" << std::endl; std::sort( lines.begin( ), lines.end( ) ); return 0; }
What this does is read /etc/passwd into a vector of lines, then duplicating this vector onto itself over and over until we have at least 1 million entries. This is where the first optimization should be useful, not only the explicit std::move()
you see in dup_vector()
, but also the push_back
per se should perform better when it needs to resize (create new + copy) the inner array.
Finally, the vector is sorted. This should definitely be faster when you don't need to copy temporary objects each time two elements are swapped.
I compile and run this two ways, one being as C++98, the next as C++11 (with -DCPP11 for the explicit move):
1> $ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out 2> $ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out 3> $ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out 4> $ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
With the following results (twice for each compilation):
GCC C++98 1> real 0m9.626s 1> real 0m9.709s GCC C++11 2> real 0m10.163s 2> real 0m10.130s
So, it's slightly slower to run when compiled as C++11 code. Similar results goes for clang:
clang C++98 3> real 0m8.906s 3> real 0m8.750s clang C++11 4> real 0m8.858s 4> real 0m9.053s
Can someone tell me why this is? Are the compilers optimizing so good even when compiling for pre-C++11, that they practically reach move semantic behaviour after all? If I add -O2
, all code runs faster, but the results between the different standards are almost the same as above.
EDIT: New results with my::string and rather than std::string, and larger individual strings:
$ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out real 0m16.637s $ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out real 0m17.169s $ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out real 0m16.222s $ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out real 0m15.652s
There are very small differences between C++98 and C+11 with move semantics. Slightly slower with C++11 with GCC and slightly faster with clang, but still very small differencies.
EDIT2: Now without std::string
's COW, the performance improvement is huge:
$ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out real 0m10.313s $ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out real 0m5.267s $ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out real 0m10.218s $ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out real 0m3.376s
With optimization, the difference is a lot bigger too:
$ rm -f a.out ; g++ -O2 --std=c++98 test.cpp ; time ./a.out real 0m5.243s $ rm -f a.out ; g++ -O2 --std=c++11 -DCPP11 test.cpp ; time ./a.out real 0m0.803s $ rm -f a.out ; clang++ -O2 --std=c++98 test.cpp ; time ./a.out real 0m5.248s $ rm -f a.out ; clang++ -O2 --std=c++11 -DCPP11 test.cpp ; time ./a.out real 0m0.785s
Above showing a factor of ~6-7 times faster with C++11.
Thanks for the great comments and answers. I hope this post will be useful and interesting to others too.
That's what rvalue references and move semantics are for! Move semantics allows you to avoid unnecessary copies when working with temporary objects that are about to evaporate, and whose resources can safely be taken from that temporary object and used by another.
In C++11, the resources of the objects can be moved from one object to another rather than copying the whole data of the object to another. This can be done by using move semantics in C++11. Move semantics points the other object to the already existing object in the memory.
This should definitely be faster when you don't need to copy temporary objects each time two elements are swapped.
std::string
has a swap
member, so sort
will already use that, and it's internal implementation will already be move semantics, effectively. And you won't see a difference between copy and move for std::string
as long as SSO is involved. In addition, some versions of GCC still have a non-C++11-permitted COW-based implementation, which also would not see much difference between copy and move.
This is probably due to the small string optimization, which can occur (depending on the compiler) for strings shorter than e.g 16 characters. I would guess that all the lines in the file are quite short, since they are passwords.
When small string optimization is active for a particular string then move is done as a copy.
You will need to have larger strings to see any speed improvements with move semantics.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With