Performance difference between C++ vectors and plain arrays has been extensively discussed, for example here and here. Usually discussions conclude that vectors and arrays are similar in terms on performance when accessed with the []
operator and the compiler is enabled to inline functions. That is why expected but I came through a case where it seems that is not true. The functionality of the lines below is quite simple: a 3D volume is taken and it is swap and applied some kind of 3D little mask a certain number of times. Depending on the VERSION
macro, volumes will be declared as vectors and accessed through the at
operator (VERSION=2
), declared as vectors and accessed via []
(VERSION=1
) or declared as simple arrays.
#include <vector>
#define NX 100
#define NY 100
#define NZ 100
#define H 1
#define C0 1.5f
#define C1 0.25f
#define T 3000
#if !defined(VERSION) || VERSION > 2 || VERSION < 0
#error "Bad version"
#endif
#if VERSION == 2
#define AT(_a_,_b_) (_a_.at(_b_))
typedef std::vector<float> Field;
#endif
#if VERSION == 1
#define AT(_a_,_b_) (_a_[_b_])
typedef std::vector<float> Field;
#endif
#if VERSION == 0
#define AT(_a_,_b_) (_a_[_b_])
typedef float* Field;
#endif
#include <iostream>
#include <omp.h>
int main(void) {
#if VERSION != 0
Field img(NX*NY*NY);
#else
Field img = new float[NX*NY*NY];
#endif
double end, begin;
begin = omp_get_wtime();
const int csize = NZ;
const int psize = NZ * NX;
for(int t = 0; t < T; t++ ) {
/* Swap the 3D volume and apply the "blurring" coefficients */
#pragma omp parallel for
for(int j = H; j < NY-H; j++ ) {
for( int i = H; i < NX-H; i++ ) {
for( int k = H; k < NZ-H; k++ ) {
int eindex = k+i*NZ+j*NX*NZ;
AT(img,eindex) = C0 * AT(img,eindex) +
C1 * (AT(img,eindex - csize) +
AT(img,eindex + csize) +
AT(img,eindex - psize) +
AT(img,eindex + psize) );
}
}
}
}
end = omp_get_wtime();
std::cout << "Elapsed "<< (end-begin) <<" s." << std::endl;
/* Access img field so we force it to be deleted after accouting time */
#define WHATEVER 12.f
if( img[ NZ ] == WHATEVER ) {
std::cout << "Whatever" << std::endl;
}
#if VERSION == 0
delete[] img;
#endif
}
One would expect code will perform the same with VERSION=1
and VERSION=0
, but the output is as follows:
If I compile without OMP (I've got only two cores), I get similar results:
I always compile with GCC 4.6.3 and the compilation options -fopenmp -finline-functions -O3
(I of course remove -fopenmp
when I compile without omp) Is there something I do wrong, for example when compiling? Or should we really expect that difference between vectors and arrays?
PS: I cannot use std::array because of the compiler, of which I depend, that doesn't support C11 standard. With ICC 13.1.2 I get similar behavior.
I tried your code, used chrono to count the time.
And I compiled with clang (version 3.5) and libc++.
clang++ test.cc -std=c++1y -stdlib=libc++ -lc++abi -finline-functions -O3
The result is exactly same for VERSION 0 and VERSION 1, there's no big difference. They are both 3.4 seconds in average (I use virtual machine so it is slower.).
Then I tried g++ (version 4.8.1),
g++ test.cc -std=c++1y -finline-functions -O3
The result shows that, for VERSION 0, it is 4.4seconds (roughly), for VERSION 1, it is 5.2 seconds (roughly).
I then, tried clang++ with libstdc++.
clang++ test.cc -std=c++11 -finline-functions -O3
voila, the result back to 3.4seconds again.
So, it's purely the optimization "bug" of g++.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With