Here's a loop that I've tried with <code>std::vector<double></code> and with plain old <code>double*</code>. For 10 million elements, the vector version consistently runs in about 80% of the time that the <code>double*</code> version takes; for pretty much any value of <code>N</code>, vector is notably faster. Peeking at the GCC STL source code, I don't see that <code>std::vector</code> is doing anything essentially fancier than what the <code>double*</code> idiom is doing (i.e., allocate with plain old <code>new[]</code>, <code>operator[]</code> dereferences an offset). This question speaks to that, too. Any ideas why the vector version is faster? <pre class="prettyprint"><code>Compiler: GCC 4.6.1 Example compile line: g++ -Ofast -march=native -DNDEBUG \ -ftree-vectorizer-verbose=2 -o vector.bin \ vector.cpp -lrt OS: CentOS 5 CPU: Opteron 8431 RAM: 128 GB </code></pre> Results are qualitatively the same if I use icpc 11.1 or run on a Xeon. Also, the vectorizer dump says that only the fill operation in <code>std::vector</code>'s constructor was vectorized. The vector version: <pre class="prettyprint"><code>#include <vector> #include <iostream> #include <boost/lexical_cast.hpp> #include "util.h" #include "rkck_params.h" using namespace std; int main( int argc, char* argv[] ) { const size_t N = boost::lexical_cast<size_t>( argv[ 1 ] ); vector<double> y_old( N ); vector<double> y_new( N ); vector<double> y_err( N ); vector<double> k0( N ); vector<double> k1( N ); vector<double> k2( N ); vector<double> k3( N ); vector<double> k4( N ); vector<double> k5( N ); const double h = 0.5; const timespec start = clock_tick(); for ( size_t i = 0 ; i < N ; ++i ) { y_new[ i ] = y_old[ i ] + h *( rkck::c[ 0 ]*k0[ i ] + rkck::c[ 2 ]*k2[ i ] + rkck::c[ 3 ]*k3[ i ] + rkck::c[ 5 ]*k5[ i ] ); y_err[ i ] = h *( rkck::cdiff[ 0 ]*k0[ i ] + rkck::cdiff[ 2 ]*k2[ i ] + rkck::cdiff[ 3 ]*k3[ i ] + rkck::cdiff[ 4 ]*k4[ i ] + rkck::cdiff[ 5 ]*k5[ i ] ); } const timespec stop = clock_tick(); const double total_time = seconds( start, stop ); // Output cout << "vector\t" << N << "\t" << total_time << endl; return 0; } </code></pre> The <code>double*</code> version: <pre class="prettyprint"><code>#include <iostream> #include <boost/lexical_cast.hpp> #include "util.h" #include "rkck_params.h" using namespace std; int main( int argc, char* argv[] ) { const size_t N = boost::lexical_cast<size_t>( argv[ 1 ] ); double* y_old = new double[ N ]; double* y_new = new double[ N ]; double* y_err = new double[ N ]; double* k0 = new double[ N ]; double* k1 = new double[ N ]; double* k2 = new double[ N ]; double* k3 = new double[ N ]; double* k4 = new double[ N ]; double* k5 = new double[ N ]; const double h = 0.5; const timespec start = clock_tick(); for ( size_t i = 0 ; i < N ; ++i ) { y_new[ i ] = y_old[ i ] + h *( rkck::c[ 0 ]*k0[ i ] + rkck::c[ 2 ]*k2[ i ] + rkck::c[ 3 ]*k3[ i ] + rkck::c[ 5 ]*k5[ i ] ); y_err[ i ] = h *( rkck::cdiff[ 0 ]*k0[ i ] + rkck::cdiff[ 2 ]*k2[ i ] + rkck::cdiff[ 3 ]*k3[ i ] + rkck::cdiff[ 4 ]*k4[ i ] + rkck::cdiff[ 5 ]*k5[ i ] ); } const timespec stop = clock_tick(); const double total_time = seconds( start, stop ); delete [] y_old; delete [] y_new; delete [] y_err; delete [] k0; delete [] k1; delete [] k2; delete [] k3; delete [] k4; delete [] k5; // Output cout << "plain\t" << N << "\t" << total_time << endl; return 0; } </code></pre> <code>rkck_params.h</code>: <pre class="prettyprint"><code>#ifndef RKCK_PARAMS_H #define RKCK_PARAMS_H namespace rkck { // C.f. $c_i$ in Ch. 16.2 of NR in C++, 2nd ed. const double c[ 6 ] = { 37.0/378.0, 0.0, 250.0/621.0, 125.0/594, 0.0, 512.0/1771.0 }; // C.f. $( c_i - c_i^* )$ in Ch. 16.2 of NR in C++, 2nd ed. const double cdiff[ 6 ] = { c[ 0 ] - 2825.0/27648.0, c[ 1 ] - 0.0, c[ 2 ] - 18575.0/48384.0, c[ 3 ] - 13525.0/55296.0, c[ 4 ] - 277.0/14336.0, c[ 5 ] - 1.0/4.0 }; } #endif </code></pre> <code>util.h</code>: <pre class="prettyprint"><code>#ifndef UTIL_H #define UTIL_H #include <time.h> #include <utility> inline timespec clock_tick() { timespec tick; clock_gettime( CLOCK_REALTIME, &tick ); return tick; } // \cite{www.guyrutenberg.com/2007/09/22/profiling-code-using-clock_gettime} inline double seconds( const timespec& earlier, const timespec& later ) { double seconds_diff = -1.0; double nano_diff = -1.0; if ( later.tv_nsec < earlier.tv_nsec ) { seconds_diff = later.tv_sec - earlier.tv_sec - 1; nano_diff = ( 1.0e9 + later.tv_nsec - earlier.tv_nsec )*1.0e-9; } else { seconds_diff = later.tv_sec - earlier.tv_sec; nano_diff = ( later.tv_nsec - earlier.tv_nsec )*1.0e-9; } return seconds_diff + nano_diff; } #endif </code></pre>

In the vector version your data is initialized to zero. In the <code>new</code> version it's uninitialized, so different work might be done. Did you run multiple times, in different orders?

vector<double> faster than double*: why?

Tags:

c++

arrays

optimization

vector

ode

Here's a loop that I've tried with std::vector<double> and with plain old double*.

For 10 million elements, the vector version consistently runs in about 80% of the time that the double* version takes; for pretty much any value of N, vector is notably faster.

Peeking at the GCC STL source code, I don't see that std::vector is doing anything essentially fancier than what the double* idiom is doing (i.e., allocate with plain old new[], operator[] dereferences an offset). This question speaks to that, too.

Any ideas why the vector version is faster?

Compiler: GCC 4.6.1
Example compile line: g++ -Ofast -march=native -DNDEBUG \
                      -ftree-vectorizer-verbose=2 -o vector.bin \
                      vector.cpp -lrt
OS: CentOS 5
CPU: Opteron 8431
RAM: 128 GB

Results are qualitatively the same if I use icpc 11.1 or run on a Xeon. Also, the vectorizer dump says that only the fill operation in std::vector's constructor was vectorized.

The vector version:

#include <vector>
#include <iostream>
#include <boost/lexical_cast.hpp>
#include "util.h"
#include "rkck_params.h"

using namespace std;

int main( int argc, char* argv[] )
{
    const size_t N = boost::lexical_cast<size_t>( argv[ 1 ] );

    vector<double> y_old( N );
    vector<double> y_new( N );
    vector<double> y_err( N );
    vector<double> k0( N );
    vector<double> k1( N );
    vector<double> k2( N );
    vector<double> k3( N );
    vector<double> k4( N );
    vector<double> k5( N );

    const double h = 0.5;

    const timespec start = clock_tick();
    for ( size_t i = 0 ; i < N ; ++i )
    {
        y_new[ i ] =   y_old[ i ]
                     + h
                      *(
                           rkck::c[ 0 ]*k0[ i ]
                         + rkck::c[ 2 ]*k2[ i ]
                         + rkck::c[ 3 ]*k3[ i ]
                         + rkck::c[ 5 ]*k5[ i ]
                       );
        y_err[ i ] =  h
                     *(
                          rkck::cdiff[ 0 ]*k0[ i ]
                        + rkck::cdiff[ 2 ]*k2[ i ]
                        + rkck::cdiff[ 3 ]*k3[ i ]
                        + rkck::cdiff[ 4 ]*k4[ i ]
                        + rkck::cdiff[ 5 ]*k5[ i ]
                      );
    }
    const timespec stop = clock_tick();
    const double total_time = seconds( start, stop );

    // Output
    cout << "vector\t" << N << "\t" << total_time << endl;

    return 0;
}

The double* version:

#include <iostream>
#include <boost/lexical_cast.hpp>
#include "util.h"
#include "rkck_params.h"

using namespace std;

int main( int argc, char* argv[] )
{
    const size_t N = boost::lexical_cast<size_t>( argv[ 1 ] );

    double* y_old = new double[ N ];
    double* y_new = new double[ N ];
    double* y_err = new double[ N ];
    double* k0 = new double[ N ];
    double* k1 = new double[ N ];
    double* k2 = new double[ N ];
    double* k3 = new double[ N ];
    double* k4 = new double[ N ];
    double* k5 = new double[ N ];

    const double h = 0.5;

    const timespec start = clock_tick();
    for ( size_t i = 0 ; i < N ; ++i )
    {
        y_new[ i ]
            =   y_old[ i ]
              + h
               *(
                    rkck::c[ 0 ]*k0[ i ]
                  + rkck::c[ 2 ]*k2[ i ]
                  + rkck::c[ 3 ]*k3[ i ]
                  + rkck::c[ 5 ]*k5[ i ]
                );
        y_err[ i ]
            =  h
              *(
                   rkck::cdiff[ 0 ]*k0[ i ]
                 + rkck::cdiff[ 2 ]*k2[ i ]
                 + rkck::cdiff[ 3 ]*k3[ i ]
                 + rkck::cdiff[ 4 ]*k4[ i ]
                 + rkck::cdiff[ 5 ]*k5[ i ]
               );
    }
    const timespec stop = clock_tick();
    const double total_time = seconds( start, stop );

    delete [] y_old;
    delete [] y_new;
    delete [] y_err;
    delete [] k0;
    delete [] k1;
    delete [] k2;
    delete [] k3;
    delete [] k4;
    delete [] k5;

    // Output
    cout << "plain\t" << N << "\t" << total_time << endl;

    return 0;
}

rkck_params.h:

#ifndef RKCK_PARAMS_H
#define RKCK_PARAMS_H

namespace rkck
{

// C.f. $c_i$ in Ch. 16.2 of NR in C++, 2nd ed.
const double c[ 6 ]
    = { 37.0/378.0,
        0.0,
        250.0/621.0,
        125.0/594,
        0.0,
        512.0/1771.0 };

// C.f. $( c_i - c_i^* )$ in Ch. 16.2 of NR in C++, 2nd ed.
const double cdiff[ 6 ]
    = { c[ 0 ] - 2825.0/27648.0,
        c[ 1 ] - 0.0,
        c[ 2 ] - 18575.0/48384.0,
        c[ 3 ] - 13525.0/55296.0,
        c[ 4 ] - 277.0/14336.0,
        c[ 5 ] - 1.0/4.0 };

}

#endif

util.h:

#ifndef UTIL_H
#define UTIL_H

#include <time.h>
#include <utility>

inline timespec clock_tick()
{
    timespec tick;
    clock_gettime( CLOCK_REALTIME, &tick );
    return tick;
}

// \cite{www.guyrutenberg.com/2007/09/22/profiling-code-using-clock_gettime}
inline double seconds( const timespec& earlier, const timespec& later )
{
    double seconds_diff = -1.0;
    double nano_diff = -1.0;

    if ( later.tv_nsec < earlier.tv_nsec )
    {
        seconds_diff = later.tv_sec - earlier.tv_sec - 1;
        nano_diff = ( 1.0e9 + later.tv_nsec - earlier.tv_nsec )*1.0e-9;
    }
    else
    {
        seconds_diff = later.tv_sec - earlier.tv_sec;
        nano_diff = ( later.tv_nsec - earlier.tv_nsec )*1.0e-9;
    }

    return seconds_diff + nano_diff;
}

#endif

467

asked Jul 14 '11 19:07

Hector

1 Answers

In the vector version your data is initialized to zero. In the new version it's uninitialized, so different work might be done.

Did you run multiple times, in different orders?

180

answered Oct 10 '22 03:10

Mark B

Related questions
                            
                                Using an abstract deleter with std::unique_ptr
                            
                                namelookup with Unqualified name : C++0x draft n3290
                            
                                C++, xcode and terminal, absolute basics
                            
                                Quaternion rotation not working
                            
                                C++ Sorting huge binary files
                            
                                programming ftp: how to abort file transfer?
                            
                                multiple inheritance and data members
                            
                                Unnamed namespace
                            
                                Can a templated Pointer class have a virtual destructor?
                            
                                G.711 c++ implementation
                            
                                How do you fetch or insert rows in batches using ODBC? ( in C or C++ )
                            
                                How do I set a breakpoint for when a vector<int> increases in length?
                            
                                How do I create a lambda function to match a boost::function parameter without using C++0x?
                            
                                Lua, C++, any good introductions that go deeper than lua_open()?
                            
                                boost condition variable issue
                            
                                How to make large numbers of existing functions available in the scope of a class?
                            
                                Recommend C++ container to hold top 20 minimum values
                            
                                unordered_map hash function c++
                            
                                std::is_trivially_copyable - Why are volatile scalar types not trivially copyable?
                            
                                Transitioning from C `goto` error handling paradigm to C++ exception handling paradigm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With