How to auto-vectorize strided writes with GCC?

Tags:

When compiled with GCC 5.2 using -std=c99, -O3, and -mavx2, the following code sample auto-vectorizes (assembly here):

#include <stdint.h>

void test(uint32_t *restrict a,
          uint32_t *restrict b) {
  uint32_t *a_aligned = __builtin_assume_aligned(a, 32);
  uint32_t *b_aligned = __builtin_assume_aligned(b, 32);

  for (int i = 0; i < (1L << 10); i += 2) {
    a_aligned[i] = 42 * b_aligned[i];
    a_aligned[i+1] = 3 * a_aligned[i+1];
  }
}

But the following code sample does not auto-vectorize (assembly here):

#include <stdint.h>

void test(uint32_t *restrict a,
          uint32_t *restrict b) {
  uint32_t *a_aligned = __builtin_assume_aligned(a, 32);
  uint32_t *b_aligned = __builtin_assume_aligned(b, 32);

  for (int i = 0; i < (1L << 10); i += 2) {
    a_aligned[i] = 42 * b_aligned[i];
    a_aligned[i+1] = a_aligned[i+1];
  }
}

The only difference between the samples is the scaling factor to a_aligned[i+1].

This was also the case for GCC 4.8, 4.9, and 5.1. Adding volatile to a_aligned's declaration inhibits auto-vectorization completely. The first sample consistently runs faster than the second for us, with a more pronounced speedup for smaller types (e.g. uint8_t instead of uint32_t).

Is there a way to make the second code sample auto-vectorize with GCC?

241

asked Oct 17 '15 23:10

T. Wagner

1 Answers

The following version vectorises, but that's ugly if you ask me...

#include <stdint.h>

void test(uint32_t *a, uint32_t *aa,
          uint32_t *restrict b) {
  #pragma omp simd aligned(a,aa,b:32)
  for (int i = 0; i < (1L << 10); i += 2) {
    a[i] = 2 * b[i];
    a[i+1] = aa[i+1];
  }
}

To compile with -fopenmp and to call with test(a, a, b).

answered Oct 18 '22 18:10

Gilles

Related questions
                            
                                Why aren't the argv and envp arguments to execve pointers to const?
                            
                                Can switching in-and-out PyFrameObjects be a good implementation of continuations?
                            
                                Efficiently compute the modulo of the sum of two numbers
                            
                                Pointer on Struct used in function
                            
                                How can I clean up lua's registry?
                            
                                -Wl,-wrap=symbol doesn't work for shared libraries
                            
                                Trailing padding in internal structure
                            
                                Strange C compiler error when installing R package on a cluster
                            
                                Why "movl $1, %edx" instead of "movl $0, %edx" in a do while
                            
                                In plain C, how to do you make the equivalent of a "map"?
                            
                                Connect one bluetooth device as multiple devices
                            
                                Multicore clock counter consistency
                            
                                How to retain a stacktrace when Cortex-M3 gone in hardfault?
                            
                                Why Ctrl-Z does not trigger EOF?
                            
                                Howto pass funcptr from C++ Class to a C API?
                            
                                Is it possible to call numba jited function from C/C++?
                            
                                Reading from FIFO after unlink()
                            
                                Array memory allocation in C depends on naming convention?
                            
                                Thread-safe global sqlca struct for Oracle database access
                            
                                Partially parallel loops using openmp tasks

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to auto-vectorize strided writes with GCC?

Tags:

performance

c

vectorization

gcc

c99

T. Wagner

People also ask

1 Answers

Gilles

Recent Activity

Donate For Us