Why can GCC not vectorize this function and loop?

Tags:

I'm attempting to make a function SIMD-enabled and vectorize the loop with a function call.

#include <cmath>

#pragma omp declare simd
double BlackBoxFunction(const double x) {
    return 1.0/sqrt(x);
}

double ComputeIntegral(const int n, const double a, const double b) {
    const double dx = (b - a)/n;
    double I = 0.0;
    #pragma omp simd reduction(+: I)

    for (int i = 0; i < n; i++) {
      const double xip12 = a + dx*(double(i) + 0.5);
      const double yip12 = BlackBoxFunction(xip12);
      const double dI = yip12*dx;
      I += dI; 
  }
  return I;
}

For the code above, if I compile it with icpc:

icpc worker.cc -qopenmp -qopt-report=5 -c

The opt-report shows that the function and loop are both vectorized. However, if I try to compile it with g++ 6.5:

g++ worker.cc -O3 -fopenmp -fopt-info-vec-missed -funsafe-math-optimizations -c

The output shows note:not vectorized: control flow in loop. and note: bad loop form, and the loop cannot be vectorized.

How can I vectorize the loop with GCC?

EDIT :

If I write the function into a separate file,

worker.cc:

#include "library.h"

double ComputeIntegral(const int n, const double a, const double b) {
    const double dx = (b - a)/n;
    double I = 0.0;
    #pragma omp simd reduction(+: I)

    for (int i = 0; i < n; i++) {
      const double xip12 = a + dx*(double(i) + 0.5);
      const double yip12 = BlackBoxFunction(xip12);
      const double dI = yip12*dx;
      I += dI; 
  }
  return I;
}

library.h:

#ifndef __INCLUDED_LIBRARY_H__
#define __INCLUDED_LIBRARY_H__

#pragma omp declare simd
double BlackBoxFunction(const double x); 

#endif

and library.cc:

#include <cmath>

#pragma omp declare simd
double BlackBoxFunction(const double x) {
  return 1.0/sqrt(x);
}

Then I compile it with GCC:

g++ worker.cc library.cc -O3 -fopenmp -fopt-info-vec-missed -funsafe-math-optimizations -c

It shows:

worker.cc:9:31: note: loop vectorized

but

library.cc:5:18: note:not vectorized: control flow in loop.
library.cc:5:18: note:bad loop form.

It makes me confused. I wonder whether it is already vectorized.

707

asked Jan 11 '19 12:01

pangbryant

1 Answers

Vectorization is possible with gcc, after some slight modifications of the code:

#include <cmath>

double BlackBoxFunction(const double x) {
    return 1.0/sqrt(x);
}

double ComputeIntegral(const int n, const double a, const double b) {
    const double dx = (b - a)/n;
    double I = 0.0;
    double d_i = 0.0;
    for (int i = 0; i < n; i++) {
      const double xip12 = a + dx*(d_i + 0.5);
      d_i = d_i + 1.0;
      const double yip12 = BlackBoxFunction(xip12);
      const double dI = yip12*dx;
      I += dI; 
  }
  return I;
}

This was compiled with the compiler options: -Ofast -march=haswell -fopt-info-vec-missed -funsafe-math-optimizations. The main loop compiles to

.L7:
    vaddpd  ymm2, ymm4, ymm7
    inc     eax
    vaddpd  ymm4, ymm4, ymm8
    vfmadd132pd     ymm2, ymm9, ymm5
    vsqrtpd ymm2, ymm2
    vdivpd  ymm2, ymm6, ymm2
    vfmadd231pd     ymm3, ymm5, ymm2
    cmp     eax, edx
    jne     .L7

See the following Godbolt link

I removed the #pragma omp ..., because they didn't improve the vectorization, but they did not made the vectorization worse either.

Note that only changing the compiler option from -O3 to -Ofast is sufficient to enable vectorization. Nevertheless, it is more efficient to use a double counter than an int counter which is converted to double each iteration.

Note also that the vectorization reports are quite misleading. Inspect the generated assembly code to see whether or not the vectorization was successful.

answered Oct 12 '22 12:10

wim

Related questions
                            
                                Why is an overloaded delete not called when an exception is thrown in a destructor?
                            
                                Providing tuple-like structured binding access for a class
                            
                                memory mapped files and pointers to volatile objects
                            
                                Is this compiler transformation allowed?
                            
                                Poor performance of C++ function in Cython
                            
                                PyGILState_Ensure() Causing Deadlock
                            
                                Why can pointers to non-static member functions not be used as a unary predicate for standard library algorithms?
                            
                                Software implementation of floating point division, issues with rounding
                            
                                Why does gcc warn about large alignas value?
                            
                                using std::function::target correctly
                            
                                In 2018 with C++11 and higher, are helper init() functions considered bad form?
                            
                                What does template<class = enable_if_t<...>> do?
                            
                                Are extra bytes initialized to 0 in C++?
                            
                                Initializing private std::array member in the constructor
                            
                                Qt autocomplete QCombobox in QTableview issue
                            
                                Is std::string_view trivially copyable?
                            
                                Why can't a 2D std::array be initialized with two layers of list-initializers?
                            
                                strong enum typedef: clang bug or c++11 standard uncertainty?
                            
                                Template Argument Deduction Failure and Function Parameters/Arguments Mismatch
                            
                                What is this template parsing conflict called?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why can GCC not vectorize this function and loop?

Tags:

c++

vectorization

simd

openmp

pangbryant

People also ask

1 Answers

wim

Recent Activity

Donate For Us