Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is valarray so slow?

Tags:

I am trying to use valarray since it is much like MATLAB while operating vector and matrices. I first did some performance check and found that valarray cannot achieve the performance declared as in the book C++ programming language by Stroustrup.

The test program actually did 5 million multiplication of doubles. I thought that c = a*b would at least be comparable to the for loop double type element multiplication, but I am totally wrong. I tried on several computers and Microsoft Visual C++ 6.0 and Visual Studio 2008.

By the way, I tested on MATLAB using the following code:

len = 5*1024*1024; a = rand(len, 1); b = rand(len, 1); c = zeros(len, 1); tic; c = a.*b; toc; 

And the result is 46 ms. This time is not high precision; it only works as a reference.

The code is:

#include <iostream> #include <valarray> #include <iostream> #include "windows.h"  using namespace std; SYSTEMTIME stime; LARGE_INTEGER sys_freq;  double gettime_hp();  int main() {     enum { N = 5*1024*1024 };     valarray<double> a(N), b(N), c(N);     QueryPerformanceFrequency(&sys_freq);     int i, j;     for (j=0 ; j<8 ; ++j)     {         for (i=0 ; i<N ; ++i)         {             a[i] = rand();             b[i] = rand();         }          double* a1 = &a[0], *b1 = &b[0], *c1 = &c[0];         double dtime = gettime_hp();         for (i=0 ; i<N ; ++i)             c1[i] = a1[i] * b1[i];         dtime = gettime_hp()-dtime;         cout << "double operator* " << dtime << " ms\n";          dtime = gettime_hp();         c = a*b ;         dtime = gettime_hp() - dtime;         cout << "valarray operator* " << dtime << " ms\n";          dtime = gettime_hp();         for (i=0 ; i<N ; ++i)             c[i] = a[i] * b[i];         dtime = gettime_hp() - dtime;         cout << "valarray[i] operator* " << dtime<< " ms\n";          cout << "------------------------------------------------------\n";     } }  double gettime_hp() {     LARGE_INTEGER tick;     extern LARGE_INTEGER sys_freq;     QueryPerformanceCounter(&tick);     return (double)tick.QuadPart * 1000.0 / sys_freq.QuadPart; } 

The running results: (release mode with maximal speed optimization)

double operator* 52.3019 ms valarray operator* 128.338 ms valarray[i] operator* 43.1801 ms ------------------------------------------------------ double operator* 43.4036 ms valarray operator* 145.533 ms valarray[i] operator* 44.9121 ms ------------------------------------------------------ double operator* 43.2619 ms valarray operator* 158.681 ms valarray[i] operator* 43.4871 ms ------------------------------------------------------ double operator* 42.7317 ms valarray operator* 173.164 ms valarray[i] operator* 80.1004 ms ------------------------------------------------------ double operator* 43.2236 ms valarray operator* 158.004 ms valarray[i] operator* 44.3813 ms ------------------------------------------------------ 

Debugging mode with same optimization:

double operator* 41.8123 ms valarray operator* 201.484 ms valarray[i] operator* 41.5452 ms ------------------------------------------------------ double operator* 40.2238 ms valarray operator* 215.351 ms valarray[i] operator* 40.2076 ms ------------------------------------------------------ double operator* 40.5859 ms valarray operator* 232.007 ms valarray[i] operator* 40.8803 ms ------------------------------------------------------ double operator* 40.9734 ms valarray operator* 234.325 ms valarray[i] operator* 40.9711 ms ------------------------------------------------------ double operator* 41.1977 ms valarray operator* 234.409 ms valarray[i] operator* 41.1429 ms ------------------------------------------------------ double operator* 39.7754 ms valarray operator* 234.26 ms valarray[i] operator* 39.6338 ms ------------------------------------------------------ 
like image 811
shangping Avatar asked Jul 27 '11 20:07

shangping


1 Answers

I just tried it on a Linux x86-64 system (Sandy Bridge CPU):

gcc 4.5.0:

double operator* 9.64185 ms valarray operator* 9.36987 ms valarray[i] operator* 9.35815 ms 

Intel ICC 12.0.2:

double operator* 7.76757 ms valarray operator* 9.60208 ms valarray[i] operator* 7.51409 ms 

In both cases I just used -O3 and no other optimisation-related flags.

It looks like the MS C++ compiler and/or valarray implementation suck.


Here's the OP's code modified for Linux:

#include <iostream> #include <valarray> #include <iostream> #include <ctime>  using namespace std ;  double gettime_hp();  int main() {     enum { N = 5*1024*1024 };     valarray<double> a(N), b(N), c(N) ;     int i,j;     for(  j=0 ; j<8 ; ++j )     {         for(  i=0 ; i<N ; ++i )         {             a[i]=rand();             b[i]=rand();         }          double* a1 = &a[0], *b1 = &b[0], *c1 = &c[0] ;         double dtime=gettime_hp();         for(  i=0 ; i<N ; ++i ) c1[i] = a1[i] * b1[i] ;         dtime=gettime_hp()-dtime;         cout << "double operator* " << dtime << " ms\n" ;          dtime=gettime_hp();         c = a*b ;         dtime=gettime_hp()-dtime;         cout << "valarray operator* " << dtime << " ms\n" ;          dtime=gettime_hp();         for(  i=0 ; i<N ; ++i ) c[i] = a[i] * b[i] ;         dtime=gettime_hp()-dtime;         cout << "valarray[i] operator* " << dtime<< " ms\n" ;          cout << "------------------------------------------------------\n" ;     } }  double gettime_hp() {     struct timespec timestamp;      clock_gettime(CLOCK_REALTIME, &timestamp);     return timestamp.tv_sec * 1000.0 + timestamp.tv_nsec * 1.0e-6; } 
like image 120
Paul R Avatar answered Oct 29 '22 12:10

Paul R