I am trying to use valarray since it is much like MATLAB while operating vector and matrices. I first did some performance check and found that valarray cannot achieve the performance declared as in the book C++ programming language by Stroustrup.
The test program actually did 5 million multiplication of doubles. I thought that c = a*b would at least be comparable to the for
loop double type element multiplication, but I am totally wrong. I tried on several computers and Microsoft Visual C++ 6.0 and Visual Studio 2008.
By the way, I tested on MATLAB using the following code:
len = 5*1024*1024; a = rand(len, 1); b = rand(len, 1); c = zeros(len, 1); tic; c = a.*b; toc;
And the result is 46 ms. This time is not high precision; it only works as a reference.
The code is:
#include <iostream> #include <valarray> #include <iostream> #include "windows.h" using namespace std; SYSTEMTIME stime; LARGE_INTEGER sys_freq; double gettime_hp(); int main() { enum { N = 5*1024*1024 }; valarray<double> a(N), b(N), c(N); QueryPerformanceFrequency(&sys_freq); int i, j; for (j=0 ; j<8 ; ++j) { for (i=0 ; i<N ; ++i) { a[i] = rand(); b[i] = rand(); } double* a1 = &a[0], *b1 = &b[0], *c1 = &c[0]; double dtime = gettime_hp(); for (i=0 ; i<N ; ++i) c1[i] = a1[i] * b1[i]; dtime = gettime_hp()-dtime; cout << "double operator* " << dtime << " ms\n"; dtime = gettime_hp(); c = a*b ; dtime = gettime_hp() - dtime; cout << "valarray operator* " << dtime << " ms\n"; dtime = gettime_hp(); for (i=0 ; i<N ; ++i) c[i] = a[i] * b[i]; dtime = gettime_hp() - dtime; cout << "valarray[i] operator* " << dtime<< " ms\n"; cout << "------------------------------------------------------\n"; } } double gettime_hp() { LARGE_INTEGER tick; extern LARGE_INTEGER sys_freq; QueryPerformanceCounter(&tick); return (double)tick.QuadPart * 1000.0 / sys_freq.QuadPart; }
The running results: (release mode with maximal speed optimization)
double operator* 52.3019 ms valarray operator* 128.338 ms valarray[i] operator* 43.1801 ms ------------------------------------------------------ double operator* 43.4036 ms valarray operator* 145.533 ms valarray[i] operator* 44.9121 ms ------------------------------------------------------ double operator* 43.2619 ms valarray operator* 158.681 ms valarray[i] operator* 43.4871 ms ------------------------------------------------------ double operator* 42.7317 ms valarray operator* 173.164 ms valarray[i] operator* 80.1004 ms ------------------------------------------------------ double operator* 43.2236 ms valarray operator* 158.004 ms valarray[i] operator* 44.3813 ms ------------------------------------------------------
Debugging mode with same optimization:
double operator* 41.8123 ms valarray operator* 201.484 ms valarray[i] operator* 41.5452 ms ------------------------------------------------------ double operator* 40.2238 ms valarray operator* 215.351 ms valarray[i] operator* 40.2076 ms ------------------------------------------------------ double operator* 40.5859 ms valarray operator* 232.007 ms valarray[i] operator* 40.8803 ms ------------------------------------------------------ double operator* 40.9734 ms valarray operator* 234.325 ms valarray[i] operator* 40.9711 ms ------------------------------------------------------ double operator* 41.1977 ms valarray operator* 234.409 ms valarray[i] operator* 41.1429 ms ------------------------------------------------------ double operator* 39.7754 ms valarray operator* 234.26 ms valarray[i] operator* 39.6338 ms ------------------------------------------------------
I just tried it on a Linux x86-64 system (Sandy Bridge CPU):
gcc 4.5.0:
double operator* 9.64185 ms valarray operator* 9.36987 ms valarray[i] operator* 9.35815 ms
Intel ICC 12.0.2:
double operator* 7.76757 ms valarray operator* 9.60208 ms valarray[i] operator* 7.51409 ms
In both cases I just used -O3
and no other optimisation-related flags.
It looks like the MS C++ compiler and/or valarray implementation suck.
Here's the OP's code modified for Linux:
#include <iostream> #include <valarray> #include <iostream> #include <ctime> using namespace std ; double gettime_hp(); int main() { enum { N = 5*1024*1024 }; valarray<double> a(N), b(N), c(N) ; int i,j; for( j=0 ; j<8 ; ++j ) { for( i=0 ; i<N ; ++i ) { a[i]=rand(); b[i]=rand(); } double* a1 = &a[0], *b1 = &b[0], *c1 = &c[0] ; double dtime=gettime_hp(); for( i=0 ; i<N ; ++i ) c1[i] = a1[i] * b1[i] ; dtime=gettime_hp()-dtime; cout << "double operator* " << dtime << " ms\n" ; dtime=gettime_hp(); c = a*b ; dtime=gettime_hp()-dtime; cout << "valarray operator* " << dtime << " ms\n" ; dtime=gettime_hp(); for( i=0 ; i<N ; ++i ) c[i] = a[i] * b[i] ; dtime=gettime_hp()-dtime; cout << "valarray[i] operator* " << dtime<< " ms\n" ; cout << "------------------------------------------------------\n" ; } } double gettime_hp() { struct timespec timestamp; clock_gettime(CLOCK_REALTIME, ×tamp); return timestamp.tv_sec * 1000.0 + timestamp.tv_nsec * 1.0e-6; }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With