I've been writing C++11 code for quite some time now, and haven't done any benchmarking of it, only expecting things like vector operations to "just be faster" now with move semantics. So when actually benchmarking with GCC 4.7.2 and clang 3.0 (default compilers on Ubuntu 12.10 64-bit) I get very unsatisfying results. This is my test code: EDIT: With regards to the (good) answers posted by @DeadMG and @ronag, I changed the element type from <code>std::string</code> to <code>my::string</code> which does not have a <code>swap()</code>, and made all inner strings larger (200-700 bytes) so that they shouldn't be the victims of SSO. EDIT2: COW was the reason. Adapted code again by the great comments, changed the storage from <code>std::string</code> to <code>std::vector<char></code> and leaving out copy/move onstructors (letting the compiler generate them instead). Without COW, the speed difference is actually huge. EDIT3: Re-added the previous solution when compiled with <code>-DCOW</code>. This makes the internal storage a <code>std::string</code> rather than a <code>std::vector<char></code> as requested by @chico. <pre class="prettyprint"><code>#include <string> #include <vector> #include <fstream> #include <iostream> #include <algorithm> #include <functional> static std::size_t dec = 0; namespace my { class string { public: string( ) { } #ifdef COW string( const std::string& ref ) : str( ref ), val( dec % 2 ? - ++dec : ++dec ) { #else string( const std::string& ref ) : val( dec % 2 ? - ++dec : ++dec ) { str.resize( ref.size( ) ); std::copy( ref.begin( ), ref.end( ), str.begin( ) ); #endif } bool operator<( const string& other ) const { return val < other.val; } private: #ifdef COW std::string str; #else std::vector< char > str; #endif std::size_t val; }; } template< typename T > void dup_vector( T& vec ) { T v = vec; for ( typename T::iterator i = v.begin( ); i != v.end( ); ++i ) #ifdef CPP11 vec.push_back( std::move( *i ) ); #else vec.push_back( *i ); #endif } int main( ) { std::ifstream file; file.open( "/etc/passwd" ); std::vector< my::string > lines; while ( ! file.eof( ) ) { std::string s; std::getline( file, s ); lines.push_back( s + s + s + s + s + s + s + s + s ); } while ( lines.size( ) < ( 1000 * 1000 ) ) dup_vector( lines ); std::cout << lines.size( ) << " elements" << std::endl; std::sort( lines.begin( ), lines.end( ) ); return 0; } </code></pre> What this does is read /etc/passwd into a vector of lines, then duplicating this vector onto itself over and over until we have at least 1 million entries. This is where the first optimization should be useful, not only the explicit <code>std::move()</code> you see in <code>dup_vector()</code>, but also the <code>push_back</code> per se should perform better when it needs to resize (create new + copy) the inner array. Finally, the vector is sorted. This should definitely be faster when you don't need to copy temporary objects each time two elements are swapped. I compile and run this two ways, one being as C++98, the next as C++11 (with -DCPP11 for the explicit move): <pre class="prettyprint"><code>1> $ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out 2> $ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out 3> $ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out 4> $ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out </code></pre> With the following results (twice for each compilation): <pre class="prettyprint"><code>GCC C++98 1> real 0m9.626s 1> real 0m9.709s GCC C++11 2> real 0m10.163s 2> real 0m10.130s </code></pre> So, it's slightly slower to run when compiled as C++11 code. Similar results goes for clang: <pre class="prettyprint"><code>clang C++98 3> real 0m8.906s 3> real 0m8.750s clang C++11 4> real 0m8.858s 4> real 0m9.053s </code></pre> Can someone tell me why this is? Are the compilers optimizing so good even when compiling for pre-C++11, that they practically reach move semantic behaviour after all? If I add <code>-O2</code>, all code runs faster, but the results between the different standards are almost the same as above. EDIT: New results with my::string and rather than std::string, and larger individual strings: <pre class="prettyprint"><code>$ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out real 0m16.637s $ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out real 0m17.169s $ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out real 0m16.222s $ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out real 0m15.652s </code></pre> There are very small differences between C++98 and C+11 with move semantics. Slightly slower with C++11 with GCC and slightly faster with clang, but still very small differencies. EDIT2: Now without <code>std::string</code>'s COW, the performance improvement is huge: <pre class="prettyprint"><code>$ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out real 0m10.313s $ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out real 0m5.267s $ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out real 0m10.218s $ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out real 0m3.376s </code></pre> With optimization, the difference is a lot bigger too: <pre class="prettyprint"><code>$ rm -f a.out ; g++ -O2 --std=c++98 test.cpp ; time ./a.out real 0m5.243s $ rm -f a.out ; g++ -O2 --std=c++11 -DCPP11 test.cpp ; time ./a.out real 0m0.803s $ rm -f a.out ; clang++ -O2 --std=c++98 test.cpp ; time ./a.out real 0m5.248s $ rm -f a.out ; clang++ -O2 --std=c++11 -DCPP11 test.cpp ; time ./a.out real 0m0.785s </code></pre> Above showing a factor of ~6-7 times faster with C++11. Thanks for the great comments and answers. I hope this post will be useful and interesting to others too.

This is probably due to the small string optimization, which can occur (depending on the compiler) for strings shorter than e.g 16 characters. I would guess that all the lines in the file are quite short, since they are passwords. When small string optimization is active for a particular string then move is done as a copy. You will need to have larger strings to see any speed improvements with move semantics.

(Missing) performance improvements with C++11 move semantics

Tags:

I've been writing C++11 code for quite some time now, and haven't done any benchmarking of it, only expecting things like vector operations to "just be faster" now with move semantics. So when actually benchmarking with GCC 4.7.2 and clang 3.0 (default compilers on Ubuntu 12.10 64-bit) I get very unsatisfying results. This is my test code:

EDIT: With regards to the (good) answers posted by @DeadMG and @ronag, I changed the element type from std::string to my::string which does not have a swap(), and made all inner strings larger (200-700 bytes) so that they shouldn't be the victims of SSO.

EDIT2: COW was the reason. Adapted code again by the great comments, changed the storage from std::string to std::vector<char> and leaving out copy/move onstructors (letting the compiler generate them instead). Without COW, the speed difference is actually huge.

EDIT3: Re-added the previous solution when compiled with -DCOW. This makes the internal storage a std::string rather than a std::vector<char> as requested by @chico.

#include <string> #include <vector> #include <fstream> #include <iostream> #include <algorithm> #include <functional>  static std::size_t dec = 0;  namespace my { class string { public:     string( ) { } #ifdef COW     string( const std::string& ref ) : str( ref ), val( dec % 2 ? - ++dec : ++dec ) { #else     string( const std::string& ref ) : val( dec % 2 ? - ++dec : ++dec ) {         str.resize( ref.size( ) );         std::copy( ref.begin( ), ref.end( ), str.begin( ) ); #endif     }      bool operator<( const string& other ) const { return val < other.val; }  private: #ifdef COW     std::string str; #else     std::vector< char > str; #endif     std::size_t val; }; }   template< typename T > void dup_vector( T& vec ) {     T v = vec;     for ( typename T::iterator i = v.begin( ); i != v.end( ); ++i ) #ifdef CPP11         vec.push_back( std::move( *i ) ); #else         vec.push_back( *i ); #endif }  int main( ) {     std::ifstream file;     file.open( "/etc/passwd" );     std::vector< my::string > lines;     while ( ! file.eof( ) )     {         std::string s;         std::getline( file, s );         lines.push_back( s + s + s + s + s + s + s + s + s );     }      while ( lines.size( ) < ( 1000 * 1000 ) )         dup_vector( lines );     std::cout << lines.size( ) << " elements" << std::endl;      std::sort( lines.begin( ), lines.end( ) );      return 0; }

What this does is read /etc/passwd into a vector of lines, then duplicating this vector onto itself over and over until we have at least 1 million entries. This is where the first optimization should be useful, not only the explicit std::move() you see in dup_vector(), but also the push_back per se should perform better when it needs to resize (create new + copy) the inner array.

Finally, the vector is sorted. This should definitely be faster when you don't need to copy temporary objects each time two elements are swapped.

I compile and run this two ways, one being as C++98, the next as C++11 (with -DCPP11 for the explicit move):

1> $ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out 2> $ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out 3> $ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out 4> $ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out

With the following results (twice for each compilation):

GCC C++98 1> real 0m9.626s 1> real 0m9.709s  GCC C++11 2> real 0m10.163s 2> real 0m10.130s

So, it's slightly slower to run when compiled as C++11 code. Similar results goes for clang:

clang C++98 3> real 0m8.906s 3> real 0m8.750s  clang C++11 4> real 0m8.858s 4> real 0m9.053s

Can someone tell me why this is? Are the compilers optimizing so good even when compiling for pre-C++11, that they practically reach move semantic behaviour after all? If I add -O2, all code runs faster, but the results between the different standards are almost the same as above.

EDIT: New results with my::string and rather than std::string, and larger individual strings:

$ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out real    0m16.637s $ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out real    0m17.169s $ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out real    0m16.222s $ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out real    0m15.652s

There are very small differences between C++98 and C+11 with move semantics. Slightly slower with C++11 with GCC and slightly faster with clang, but still very small differencies.

EDIT2: Now without std::string's COW, the performance improvement is huge:

$ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out real    0m10.313s $ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out real    0m5.267s $ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out real    0m10.218s $ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out real    0m3.376s

With optimization, the difference is a lot bigger too:

$ rm -f a.out ; g++ -O2 --std=c++98 test.cpp ; time ./a.out real    0m5.243s $ rm -f a.out ; g++ -O2 --std=c++11 -DCPP11 test.cpp ; time ./a.out real    0m0.803s $ rm -f a.out ; clang++ -O2 --std=c++98 test.cpp ; time ./a.out real    0m5.248s $ rm -f a.out ; clang++ -O2 --std=c++11 -DCPP11 test.cpp ; time ./a.out real    0m0.785s

Above showing a factor of ~6-7 times faster with C++11.

Thanks for the great comments and answers. I hope this post will be useful and interesting to others too.

877

asked Jan 12 '13 12:01

gustaf r

2 Answers

This should definitely be faster when you don't need to copy temporary objects each time two elements are swapped.

std::string has a swap member, so sort will already use that, and it's internal implementation will already be move semantics, effectively. And you won't see a difference between copy and move for std::string as long as SSO is involved. In addition, some versions of GCC still have a non-C++11-permitted COW-based implementation, which also would not see much difference between copy and move.

answered Oct 09 '22 11:10

Puppy

This is probably due to the small string optimization, which can occur (depending on the compiler) for strings shorter than e.g 16 characters. I would guess that all the lines in the file are quite short, since they are passwords.

When small string optimization is active for a particular string then move is done as a copy.

You will need to have larger strings to see any speed improvements with move semantics.

answered Oct 09 '22 13:10

ronag

Related questions
                            
                                How to make the NSOperationQueue serial? [closed]
                            
                                Changing "Publisher" information for a ".exe" file
                            
                                How to pause, and resume a TimerTask/ Timer
                            
                                Context-free grammar for C
                            
                                Query Performance INNER JOIN ON AND comparison
                            
                                How is source port for HTTP determined? Is there ever collision in NAT? [closed]
                            
                                How to learn and use OpenGL ES 2.0? I just don't get it - serious Qn. [closed]
                            
                                jQuery .hasClass() method fails for SVG elements
                            
                                What is the Difference between GetBlobReference and GetBlobReferenceFromServer?
                            
                                SignalR with Self-Signed SSL and Self-Host
                            
                                Best practice android:onClick XML attribute or setOnClickListener? [duplicate]
                            
                                Is there some way to handle async/await behind an ASMX service?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With