How is performance dependent on the underlying data values

Question

I have the following C++ code snippet (the C++ part is the profiler class which is omitted here), compiled with VS2010 (64bit Intel machine). The code simply multiplies an array of floats (arr2) with a scalar, and puts the result into another array (arr1):

int M = 150, N = 150;
int niter = 20000; // do many iterations to have a significant run-time
float *arr1 = (float *)calloc (M*N, sizeof(float));
float *arr2 = (float *)calloc (M*N, sizeof(float));

// Read data from file into arr2

float scale = float(6.6e-14);

// START_PROFILING
for (int iter = 0; iter < niter; ++iter) {
    for (int n = 0; n < M*N; ++n) {         
        arr1[n] += scale * arr2[n];
    }
}
// END_PROFILING

free(arr1);
free(arr2);

The reading-from-file part and profiling (i.e run-time measurement) is omitted here for simplicity.

When arr2 is initialized to random numbers in the range [0 1], the code runs about 10 times faster as compared to a case where arr2 is initialized to a sparse array in which about 2/3 of the values are zeros. I have played with the compiler options /fp and /O, which changed the run-time a little bit, but the ratio of 1:10 was approximately kept.

How come the performance is dependent on the actual values? What does the CPU do differently that makes the sparse data run ~10 times slower?
Is there a way to make the "slow data" run faster, or will any optimization (e.g vectorizing the calculation) have the same effect on both arrays (i.e, the "slow data" will still run slower then the "fast data")?

EDIT

Complete code is here: https://gist.github.com/1676742, the command line for compiling is in a comment in test.cpp.

The data files are here:

https://ccrma.stanford.edu/~itakatz/tmp/I.bin
https://ccrma.stanford.edu/~itakatz/tmp/I0.bin

Evgeny Kluev · Accepted Answer

Probably that's because your "fast" data consists only of normal floating point numbers, but your "slow" data contains lots of denormalized numbers.

As for your second question, you can try to improve speed with this (and treat all denormalized numbers as exact zeros):

#include <xmmintrin.h>
_mm_setcsr(_mm_getcsr() | 0x8040);

Borealid · Answer

I can think of two reasons for this.

First, the branch predictor may be making incorrect decisions. This is one potential cause of performance gaps caused by data changes without code changes. However, in this case, it seems very unlikely.

The second possible reason is that your "mostly zeros" data doesn't really consist of zeros, but rather of almost-zeros, or that you're keeping arr1 in the almost-zero range. See this Wikipedia link.

Arty · Answer

There is nothing strange that the data from I.bin takes longer to process: you have lots of numbers like '1.401e-045#DEN' or '2.214e-043#DEN', where #DEN means the number cannot be normalized to the standard float precision. Given that you are going to multiply it by 6.6e-14 you'll definitely have underflow exceptions, which significantly slows down calculations.

How is performance dependent on the underlying data values

Tags:

c++

performance

c

visual-studio-2010

Itamar Katz

3 Answers

Evgeny Kluev

Borealid

Arty

Recent Activity

Donate For Us

How is performance dependent on the underlying data values

Tags:

c++

performance

c

visual-studio-2010

Itamar Katz

3 Answers

Evgeny Kluev

Borealid

Arty

Related questions

Recent Activity

Donate For Us