Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C OpenMP - Reduction scalability

Tags:

c

openmp

I'm testing the performance speedup of some algorithms when using OpenMP and one of then is not scaling. Am I doing something wrong?

PC Details:

  • Memory: 7,7 GiB
  • Processor: Intel® Core™ i7-4770 CPU @ 3.40GHz × 8
  • OS: Ubuntu 15.04 64-bit
  • gcc: gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2

Code:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <omp.h>

int main(int argc, char **argv) {
  int test_size, i;
  double *vector, mean, stddeviation, start_time, duration;

  if (argc != 2) {
    printf("Usage: %s <test_size>\n", argv[0]);
    return 1;
  }

  srand((int) omp_get_wtime());

  test_size = atoi(argv[1]);
  printf("Test Size: %d\n", test_size);

  vector = (double *) malloc(test_size * sizeof(double));
  for (i = 0; i < test_size; i++) {
    vector[i] = rand();
  }

  start_time = omp_get_wtime();
  mean = 0;
  stddeviation = 0;
#pragma omp parallel default(shared) private(i)
  {
#pragma omp for reduction(+:mean)
    for (i = 0; i < test_size; i++) {
      mean += vector[i];
    }
#pragma omp single
    mean /= test_size;

#pragma omp for reduction(+:stddeviation)
    for (i = 0; i < test_size; i++) {
      stddeviation += (vector[i] - mean)*(vector[i] - mean);
    }
  }
  stddeviation = sqrt(stddeviation / test_size);
  duration = omp_get_wtime() - start_time;

  printf("Std. Deviation = %lf\n", stddeviation);
  printf("Duration: %fms\n", duration*1000);

  return 0;
}

Compilation line

gcc -c -o main.o main.c -fopenmp -lm -O3
gcc -o dp main.o -fopenmp -lm -O3

Results

$ OMP_NUM_THREADS=1 ./dp 100000000
166.224199ms

$ OMP_NUM_THREADS=2 ./dp 100000000
157.924034ms

$ OMP_NUM_THREADS=4 ./dp 100000000
159.056189ms
like image 868
DiogoDoreto Avatar asked Jun 03 '15 17:06

DiogoDoreto


1 Answers

I am not reproducing your results with Ubuntu 14.04.2 LTS, gcc 4.8, and a 2.3 GHz Intel Core i7. Here are the results that I get:

$ OMP_NUM_THREADS=1 ./so30627170 100000000
Test Size: 100000000
Std. Deviation = 619920018.463329
Duration: 206.301721ms
$ OMP_NUM_THREADS=2 ./so30627170 100000000
Test Size: 100000000
Std. Deviation = 619901821.463117
Duration: 110.381279ms
$ OMP_NUM_THREADS=4 ./so30627170 100000000
Test Size: 100000000
Std. Deviation = 619883614.594906
Duration: 78.241708ms

Because the output listed in the "Results" section of your question could not match the output from the code as listed, you may be running an old version of your code.

I thought about possibly using X86 intrinsics within the parallel for loops, but examining the assembly output, gcc already uses SIMD instructions in this case. Without march options, I was seeing gcc use SSE2 instructions. Compiling with -march=native or -mavx, gcc would use AVX instructions.

EDIT: Running the Go version of your program, I get:

$ ./tcc-go-desvio-padrao -w 1 -n 15 -t 100000000
2015/06/07 08:26:43 Workers: 1
2015/06/07 08:26:43 Tests: [100000000]
2015/06/07 08:26:43 # of executions of each test: 15
2015/06/07 08:26:43 Time to allocate memory: 584.477µs
2015/06/07 08:26:43 ===========================================
2015/06/07 08:26:43 Current test size: 100000000
2015/06/07 08:27:05 Time to fill the array: 1.322556083s
2015/06/07 08:27:05 Time to calculate: 194.10728ms
$ ./tcc-go-desvio-padrao -w 2 -n 15 -t 100000000
2015/06/07 08:27:10 Workers: 2
2015/06/07 08:27:10 Tests: [100000000]
2015/06/07 08:27:10 # of executions of each test: 15
2015/06/07 08:27:10 Time to allocate memory: 565.273µs
2015/06/07 08:27:10 ===========================================
2015/06/07 08:27:10 Current test size: 100000000
2015/06/07 08:27:22 Time to fill the array: 677.755324ms
2015/06/07 08:27:22 Time to calculate: 113.095753ms
$ ./tcc-go-desvio-padrao -w 4 -n 15 -t 100000000
2015/06/07 08:27:28 Workers: 4
2015/06/07 08:27:28 Tests: [100000000]
2015/06/07 08:27:28 # of executions of each test: 15
2015/06/07 08:27:28 Time to allocate memory: 576.568µs
2015/06/07 08:27:28 ===========================================
2015/06/07 08:27:28 Current test size: 100000000
2015/06/07 08:27:34 Time to fill the array: 353.646193ms
2015/06/07 08:27:34 Time to calculate: 79.86221ms

The timings appear about the same as the OpenMP version.

like image 171
Daniel Trebbien Avatar answered Oct 21 '22 03:10

Daniel Trebbien