Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is performance dependent on the underlying data values

I have the following C++ code snippet (the C++ part is the profiler class which is omitted here), compiled with VS2010 (64bit Intel machine). The code simply multiplies an array of floats (arr2) with a scalar, and puts the result into another array (arr1):

int M = 150, N = 150;
int niter = 20000; // do many iterations to have a significant run-time
float *arr1 = (float *)calloc (M*N, sizeof(float));
float *arr2 = (float *)calloc (M*N, sizeof(float));

// Read data from file into arr2

float scale = float(6.6e-14);

// START_PROFILING
for (int iter = 0; iter < niter; ++iter) {
    for (int n = 0; n < M*N; ++n) {         
        arr1[n] += scale * arr2[n];
    }
}
// END_PROFILING

free(arr1);
free(arr2); 

The reading-from-file part and profiling (i.e run-time measurement) is omitted here for simplicity.

When arr2 is initialized to random numbers in the range [0 1], the code runs about 10 times faster as compared to a case where arr2 is initialized to a sparse array in which about 2/3 of the values are zeros. I have played with the compiler options /fp and /O, which changed the run-time a little bit, but the ratio of 1:10 was approximately kept.

  • How come the performance is dependent on the actual values? What does the CPU do differently that makes the sparse data run ~10 times slower?
  • Is there a way to make the "slow data" run faster, or will any optimization (e.g vectorizing the calculation) have the same effect on both arrays (i.e, the "slow data" will still run slower then the "fast data")?

EDIT

Complete code is here: https://gist.github.com/1676742, the command line for compiling is in a comment in test.cpp.

The data files are here:

  • https://ccrma.stanford.edu/~itakatz/tmp/I.bin
  • https://ccrma.stanford.edu/~itakatz/tmp/I0.bin
like image 756
Itamar Katz Avatar asked Jan 25 '12 15:01

Itamar Katz


3 Answers

Probably that's because your "fast" data consists only of normal floating point numbers, but your "slow" data contains lots of denormalized numbers.

As for your second question, you can try to improve speed with this (and treat all denormalized numbers as exact zeros):

#include <xmmintrin.h>
_mm_setcsr(_mm_getcsr() | 0x8040);
like image 122
Evgeny Kluev Avatar answered Nov 08 '22 05:11

Evgeny Kluev


I can think of two reasons for this.

First, the branch predictor may be making incorrect decisions. This is one potential cause of performance gaps caused by data changes without code changes. However, in this case, it seems very unlikely.

The second possible reason is that your "mostly zeros" data doesn't really consist of zeros, but rather of almost-zeros, or that you're keeping arr1 in the almost-zero range. See this Wikipedia link.

like image 36
Borealid Avatar answered Nov 08 '22 07:11

Borealid


There is nothing strange that the data from I.bin takes longer to process: you have lots of numbers like '1.401e-045#DEN' or '2.214e-043#DEN', where #DEN means the number cannot be normalized to the standard float precision. Given that you are going to multiply it by 6.6e-14 you'll definitely have underflow exceptions, which significantly slows down calculations.

like image 35
Arty Avatar answered Nov 08 '22 07:11

Arty