I'm learning openmp using the example of computing the value of pi via quadature. In serial, I run the following C code:
double serial() {
double step;
double x,pi,sum = 0.0;
step = 1.0 / (double) num_steps;
for (int i = 0; i < num_steps; i++) {
x = (i + 0.5) * step; // forward quadature
sum += 4.0 / (1.0 + x*x);
}
pi = step * sum;
return pi;
}
I'm comparing this to an omp implementation using a parallel for with reduction:
double SPMD_for_reduction() {
double step;
double pi,sum = 0.0;
step = 1.0 / (double) num_steps;
#pragma omp parallel for reduction (+:sum)
for (int i = 0; i < num_steps; i++) {
double x = (i + 0.5) * step;
sum += 4.0 / (1.0 + x*x);
}
pi = step * sum;
return pi;
}
For num_steps = 1,000,000,000, and 6 threads in the case of omp, I compile and time:
double start_time = omp_get_wtime();
serial();
double end_time = omp_get_wtime();
start_time = omp_get_wtime();
SPMD_for_reduction();
end_time = omp_get_wtime();
Using no cc compiler optimizations, the runtimes are around 4s (Serial) and .66s (omp). With the -O3 flag, serial runtime drops to ".000001s" and the omp runtime is mostly unchanged. What's going on here? Is it vector instructions being used, or is it poor code or timing method? If it's vectorization, why isn't the omp function benefiting?
It may be of interest that the machine I am using is using a modern 6 core Xeon processor.
Thanks!
AOMP is AMD’s LLVM/Clang based compiler that supports OpenMP and offloading to multiple GPU acceleration targets (multi-target). Read More The AMD Optimizing C/C++ Compiler (AOCC) is a high performance compiler suite supporting C/C++ and Fortran applications, and providing advanced optimizations.
In GCC 4.9.0, OpenMP 4.0 is supported for C and C++, but not Fortran. From GCC 4.9.1, OpenMP 4.0 is fully supported for C/C++/Fortran. From GCC 6.1, OpenMP 4.5 is fully supported for C and C++. Compile with -fopenmp to enable OpenMP.
The OpenUH 3.x compiler has a full open-source implementation of OpenMP 2.5 and near-complete support for OpenMP 3.0 (including explicit task constructs) on Linux 32-bit or 64-bit platforms. Read More & Download Oracle Developer Studio 12.6 compilers (C, C++, and Fortran) support OpenMP 4.0 features.
The compiler optimizes to reduce the size of the binary instead of execution speed. If you do not specify an optimization option, gcc attempts to reduce the compilation time and to make debugging always yield the result expected from reading the source code. If you enable optimization, the compiler tries to improve performance, ...
The compiler outsmarts you. For the serial version it is able to detect, that the result of your computation is never used. Therefore it throws out the computation completely.
double start_time = omp_get_wtime();
serial(); //<-- Computations not used.
double end_time = omp_get_wtime();
In the openMP case the compiler can not see if really everything inside the function body is without an effect, so to stay on the safe side it keeps the function call.
You can of course write something like double serial_pi = serial();
and outside of the time measurement do some dummy stuff with the variable serial_pi
. This way the compiler will keep the function call and do the optimizations you are actually looking for.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With