I've been trying to install the OpenMP 4.5 off-loading to Nvidia GPU version of gcc for a while and so far no success, although I'm getting closer.
This time, I followed this script, where I have made two changes: First I specified the trunk version of gcc instead of 7.2, secondly nvptx-newlib is now included in nvptx-tools according to the github repository, so I removed that part of the script. For easy reference, the original script is
#!/bin/sh
#
# Build GCC with support for offloading to NVIDIA GPUs.
#
work_dir=$HOME/offload/wrk
install_dir=$HOME/offload/install
# Location of the installed CUDA toolkit
cuda=/usr/local/cuda
# Build assembler and linking tools
mkdir -p $work_dir
cd $work_dir
git clone https://github.com/MentorEmbedded/nvptx-tools
cd nvptx-tools
./configure \
--with-cuda-driver-include=$cuda/include \
--with-cuda-driver-lib=$cuda/lib64 \
--prefix=$install_dir
make
make install
cd ..
# Set up the GCC source tree
git clone https://github.com/MentorEmbedded/nvptx-newlib
svn co svn://gcc.gnu.org/svn/gcc/tags/gcc_7_2_0_release gcc
cd gcc
contrib/download_prerequisites
ln -s ../nvptx-newlib/newlib newlib
cd ..
target=$(gcc/config.guess)
# Build nvptx GCC
mkdir build-nvptx-gcc
cd build-nvptx-gcc
../gcc/configure \
--target=nvptx-none --with-build-time-tools=$install_dir/nvptx-none/bin \
--enable-as-accelerator-for=$target \
--disable-sjlj-exceptions \
--enable-newlib-io-long-long \
--enable-languages="c,c++,fortran,lto" \
--prefix=$install_dir
make -j4
make install
cd ..
# Build host GCC
mkdir build-host-gcc
cd build-host-gcc
../gcc/configure \
--enable-offload-targets=nvptx-none \
--with-cuda-driver-include=$cuda/include \
--with-cuda-driver-lib=$cuda/lib64 \
--disable-bootstrap \
--disable-multilib \
--enable-languages="c,c++,fortran,lto" \
--prefix=$install_dir
make -j4
make install
cd ..
After quite a while, this successfully exits. Per the instructions on that webpage, I added $install_dir/lib64 to my LD_LIBRARY_PATH and additionally to LIBRARY_PATH.
Then as a test, I have the following basic test program
#include <omp.h>
#include <cmath>
#include <iostream>
int main()
{
double data_array[1000000];
#pragma omp target teams distribute
for (int idx = 0; idx < 1000000; ++idx)
{
data_array[idx] = idx;
}
std::cout << "Hopefully this ran on the gpu...\n";
}
Then I try to compile this using offload/install/bin/g++ -fopenmp -foffload=nvptx-none main.cpp
then it returns with the following error message:
x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: libgomp.spec: No such file or directory
mkoffload: fatal error: offload/install/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
compilation terminated.
lto-wrapper: fatal error: /home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0//accel/nvptx-none/mkoffload returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
The file libgomp.spec can be found in the aforementioned $install_dir/lib64
, which on my system is offload/install/lib64/
.
Some more information about my system:
Ubuntu 16.04, accessed through slurm
Cuda 9.0.176
4x Nvidia Tesla V100
offload/install/bin/g++ -v reports:
Using built-in specs.
COLLECT_GCC=offload/install/bin/g++
COLLECT_LTO_WRAPPER=/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../gcc/configure --enable-offload-targets=nvptx-none --with-cuda-driver-include=/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/cuda-9.0.176-m4ivnigh5kuty6u7tcnroxr5on5lot6s/include --with-cuda-driver-lib=/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/cuda-9.0.176-m4ivnigh5kuty6u7tcnroxr5on5lot6s/lib64 --disable-bootstrap --disable-multilib --enable-languages=c,c++,fortran,lto --prefix=/home/over_ng/offload/install
Thread model: posix
gcc version 9.0.0 20180627 (experimental) (GCC)
offload/install/bin/g++ -print-search-dirs reports
install: /home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/
programs: =/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/:/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/:/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../../x86_64-pc-linux-gnu/bin/x86_64-pc-linux-gnu/9.0.0/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../../x86_64-pc-linux-gnu/bin/x86_64-linux-gnu/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../../x86_64-pc-linux-gnu/bin/
libraries: =/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/cuda-9.0.176-m4ivnigh5kuty6u7tcnroxr5on5lot6s/lib64/x86_64-pc-linux-gnu/9.0.0/:/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/cuda-9.0.176-m4ivnigh5kuty6u7tcnroxr5on5lot6s/lib64/x86_64-linux-gnu/:/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/cuda-9.0.176-m4ivnigh5kuty6u7tcnroxr5on5lot6s/lib64/../lib64/:/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/subversion-1.9.7-f5fbcx4xhwzrq5rhhco7byj7cbx2f4fs/lib/x86_64-pc-linux-gnu/9.0.0/:/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/subversion-1.9.7-f5fbcx4xhwzrq5rhhco7byj7cbx2f4fs/lib/x86_64-linux-gnu/:/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/subversion-1.9.7-f5fbcx4xhwzrq5rhhco7byj7cbx2f4fs/lib/../lib64/:/home/over_ng/offload/install/lib64/x86_64-pc-linux-gnu/9.0.0/:/home/over_ng/offload/install/lib64/x86_64-linux-gnu/:/home/over_ng/offload/install/lib64/../lib64/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../../x86_64-pc-linux-gnu/lib/x86_64-pc-linux-gnu/9.0.0/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../../x86_64-pc-linux-gnu/lib/x86_64-linux-gnu/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../../x86_64-pc-linux-gnu/lib/../lib64/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../x86_64-pc-linux-gnu/9.0.0/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../x86_64-linux-gnu/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../../lib64/:/lib/x86_64-pc-linux-gnu/9.0.0/:/lib/x86_64-linux-gnu/:/lib/../lib64/:/usr/lib/x86_64-pc-linux-gnu/9.0.0/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib64/:/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/cuda-9.0.176-m4ivnigh5kuty6u7tcnroxr5on5lot6s/lib64/:/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/subversion-1.9.7-f5fbcx4xhwzrq5rhhco7byj7cbx2f4fs/lib/:/home/over_ng/offload/install/lib64/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../../x86_64-pc-linux-gnu/lib/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../:/lib/:/usr/lib/
And finally, offload/install/bin/g++ -fopenmp -foffload=nvptx-none -v main.cpp reports
Using built-in specs.
COLLECT_GCC=offload/install/bin/g++
COLLECT_LTO_WRAPPER=/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../gcc/configure --enable-offload-targets=nvptx-none --with-cuda-driver-include=/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/cuda-9.0.176-m4ivnigh5kuty6u7tcnroxr5on5lot6s/include --with-cuda-driver-lib=/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/cuda-9.0.176-m4ivnigh5kuty6u7tcnroxr5on5lot6s/lib64 --disable-bootstrap --disable-multilib --enable-languages=c,c++,fortran,lto --prefix=/home/over_ng/offload/install
Thread model: posix
gcc version 9.0.0 20180627 (experimental) (GCC)
COLLECT_GCC_OPTIONS='-fopenmp' '-foffload=nvptx-none' '-v' '-shared-libgcc' '-mtune=generic' '-march=x86-64' '-pthread'
/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/cc1plus -quiet -v -imultiarch x86_64-linux-gnu -D_GNU_SOURCE -D_REENTRANT main.cpp -quiet -dumpbase main.cpp -mtune=generic -march=x86-64 -auxbase main -version -fopenmp -foffload=nvptx-none -o /tmp/cc9FAd0p.s
GNU C++14 (GCC) version 9.0.0 20180627 (experimental) (x86_64-pc-linux-gnu)
compiled by GNU C version 8.1.0, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory "/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../../x86_64-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/cuda-9.0.176-m4ivnigh5kuty6u7tcnroxr5on5lot6s/include
/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/subversion-1.9.7-f5fbcx4xhwzrq5rhhco7byj7cbx2f4fs/include
/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../../include/c++/9.0.0
/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../../include/c++/9.0.0/x86_64-pc-linux-gnu
/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../../include/c++/9.0.0/backward
/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/include
/usr/local/include
/home/over_ng/offload/install/include
/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/include-fixed
/usr/include/x86_64-linux-gnu
/usr/include
End of search list.
GNU C++14 (GCC) version 9.0.0 20180627 (experimental) (x86_64-pc-linux-gnu)
compiled by GNU C version 8.1.0, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 716ed3567afb9cd0b736d2b474553211
COLLECT_GCC_OPTIONS='-fopenmp' '-foffload=nvptx-none' '-v' '-shared-libgcc' '-mtune=generic' '-march=x86-64' '-pthread'
as -v --64 -o /tmp/cc2TYtU2.o /tmp/cc9FAd0p.s
GNU assembler version 2.26.1 (x86_64-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.26.1
COMPILER_PATH=/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/:/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/:/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/
LIBRARY_PATH=/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/cuda-9.0.176-m4ivnigh5kuty6u7tcnroxr5on5lot6s/lib64/../lib64/:/home/over_ng/offload/install/lib64/../lib64/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../../lib64/:/lib/x86_64-linux-gnu/:/lib/../lib64/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib64/:/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/cuda-9.0.176-m4ivnigh5kuty6u7tcnroxr5on5lot6s/lib64/:/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/subversion-1.9.7-f5fbcx4xhwzrq5rhhco7byj7cbx2f4fs/lib/:/home/over_ng/offload/install/lib64/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../:/lib/:/usr/lib/
Reading specs from /home/over_ng/offload/install/lib64/../lib64/libgomp.spec
COLLECT_GCC_OPTIONS='-fopenmp' '-foffload=nvptx-none' '-v' '-shared-libgcc' '-mtune=generic' '-march=x86-64' '-pthread'
/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/collect2 -plugin /home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/liblto_plugin.so -plugin-opt=/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/lto-wrapper -plugin-opt=-fresolution=/tmp/ccnGrpRF.res -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lpthread -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o /home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/crtbegin.o /home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/crtoffloadbegin.o -L/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/cuda-9.0.176-m4ivnigh5kuty6u7tcnroxr5on5lot6s/lib64/../lib64 -L/home/over_ng/offload/install/lib64/../lib64 -L/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0 -L/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../../../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib64 -L/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/cuda-9.0.176-m4ivnigh5kuty6u7tcnroxr5on5lot6s/lib64 -L/tools/spack/install/linux-ubuntu16.04-x86_64/gcc-5.4.0/subversion-1.9.7-f5fbcx4xhwzrq5rhhco7byj7cbx2f4fs/lib -L/home/over_ng/offload/install/lib64 -L/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/../../.. /tmp/cc2TYtU2.o -lstdc++ -lm -lgomp -lgcc_s -lgcc -lpthread -lc -lgcc_s -lgcc /home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/crtend.o /usr/lib/x86_64-linux-gnu/crtn.o /home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/crtoffloadend.o
/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/lto-wrapper -fresolution=/tmp/ccnGrpRF.res -flinker-output=exec -foffload-objects=/tmp/ccQDi0zV.ofldlist
/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0//accel/nvptx-none/mkoffload @/tmp/ccJAbpMz
offload/install/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc @/tmp/ccoh8KPc
Using built-in specs.
COLLECT_GCC=offload/install/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc
COLLECT_LTO_WRAPPER=/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/accel/nvptx-none/lto-wrapper
Target: nvptx-none
Configured with: ../gcc/configure --target=nvptx-none --with-build-time-tools=/home/over_ng/offload/install/nvptx-none/bin --enable-as-accelerator-for=x86_64-pc-linux-gnu --disable-sjlj-exceptions --enable-newlib-io-long-long --enable-languages=c,c++,fortran,lto --prefix=/home/over_ng/offload/install
Thread model: single
gcc version 9.0.0 20180627 (experimental) (GCC)
COLLECT_GCC_OPTIONS='-v' '-m64' '-mgomp' '-v' '-fno-openacc' '-foffload-abi=lp64' '-fopenmp' '-o' '/tmp/ccNVxXFz.mkoffload'
/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/accel/nvptx-none/lto1 -quiet -dumpbase cc2TYtU2.o -m64 -mgomp -auxbase cc2TYtU2 -version -fno-openacc -foffload-abi=lp64 -fopenmp @/tmp/cchKIS8V -o /tmp/ccZLBhjz.s
GNU GIMPLE (GCC) version 9.0.0 20180627 (experimental) (nvptx-none)
compiled by GNU C version 8.1.0, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU GIMPLE (GCC) version 9.0.0 20180627 (experimental) (nvptx-none)
compiled by GNU C version 8.1.0, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
COLLECT_GCC_OPTIONS='-v' '-m64' '-mgomp' '-v' '-fno-openacc' '-foffload-abi=lp64' '-fopenmp' '-o' '/tmp/ccNVxXFz.mkoffload'
/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/accel/nvptx-none/../../../../../../nvptx-none/bin/as -o /tmp/ccRJFdvc.o /tmp/ccZLBhjz.s
COMPILER_PATH=/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/accel/nvptx-none/:/home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/accel/nvptx-none/:/home/over_ng/offload/install/libexec/gcc/nvptx-none/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/accel/nvptx-none/:/home/over_ng/offload/install/lib/gcc/nvptx-none/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/accel/nvptx-none/../../../../../../nvptx-none/bin/
LIBRARY_PATH=/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/accel/nvptx-none/mgomp/:/home/over_ng/offload/install/lib/gcc/x86_64-pc-linux-gnu/9.0.0/accel/nvptx-none/
Reading specs from libgomp.spec
x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: libgomp.spec: No such file or directory
mkoffload: fatal error: offload/install/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
compilation terminated.
lto-wrapper: fatal error: /home/over_ng/offload/install/libexec/gcc/x86_64-pc-linux-gnu/9.0.0//accel/nvptx-none/mkoffload returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
On the same webpage as where I found the script, somebody else reported the same problem and reverting to gcc 7.2 is apparently a solution. Since I want to include the off-loading compiler in the Spack collection, I would like to be able to use any supported version. Although I can live with gcc 8 for the time being, as 9/trunk is still experimental.
This may imply that it is a bug in gcc, in that case I would like to report it to them!
Edit 1: As requested, a 'sane' CPU only program that seems to work fine:
#include <omp.h>
#include <cmath>
#include <vector>
#include <iostream>
int main()
{
const int size = 1000;
std::vector<double> sinTable(size);
#pragma omp parallel for
for(int n=0; n<size; ++n)
{
sinTable[n] = std::sin(2 * M_PI * n / size);
std::cout << sinTable[n] << '\n';
}
// the table is now initialized
}
This was compiled with offload/install/bin/g++ -fopenmp -v main_cpu.cpp -o cpu
I have been using the package gcc-offload-nvptx
in the Ubuntu repository since Ubuntu 17.10. If I compile your test code like this
g++ test.cpp -fopenmp
I get a lto-wrapper failed
error. This can be fixed using -fno-stack-protector
like this
g++ test.cpp -fopenmp -fno-stack-protector
Then test code compiles and runs. You can see that it runs on the GPU using nvprof
like this
sudo nvprof ./a.out
Some additional comments. In your test code I would use
#pragma omp target teams distribute parallel for
See OpenMP offloading to Nvidia wrong reduction
Also in your test code you should do something with data_array
or the compiler might optimize your code away.
Ubuntu 18.04 also requires -fno-stack-protector
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With