TL;DR - Does GCC (trunk) already support OpenMP 4.0 offloading to nVidia GPU?
If so, what am I doing wrong? (description below).
I'm running Ubuntu 14.04.2 LTS.
I have checked out the most recent GCC trunk (dated 25 Mar 2015).
I have installed the CUDA 7.0 toolkit according to Getting Started on Ubuntu guide. CUDA samples run successfully, i.e. deviceQuery
detects my GeForce GT 730.
I have followed the instructions from https://gcc.gnu.org/wiki/Offloading as well as https://gcc.gnu.org/install/specific.html#nvptx-x-none
I have installed nvptx-tools and nvptx-newlib (configure
, make
, sudo make install
), newlib also linked inside GCC's trunk directory with ln -s
.
Then I built the target accelerator nvptx-none compiler:
../../trunk/configure --target=nvptx-none --enable-as-accelerator-for=x86_64-pc-linux-gnu --with-build-time-tools=/usr/local/nvptx-none/bin --disable-sjlj-exceptions --enable-newlib-io-long-long
make -j 9
sudo make install DESTDIR=/install
...and the host GCC compiler itself:
../trunk/configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-offload-targets=nvptx-none=/install/prefix --with-cuda-driver=/usr/local/cuda --enable-languages=c,c++
make -j 9
sudo make install DESTDIR=/install
I have set the LD_LIBRARY_PATH accordingly:
export LD_LIBRARY_PATH=/install/usr/local/lib64:/install/usr/local/lib/gcc/nvptx-none/5.0.0/:/usr/local/cuda/lib64:$LD_LIBRARY_PATH
For sure, the mkoffload tool is built:
/install/usr/local/libexec/gcc/x86_64-pc-linux-gnu/5.0.0/accel/nvptx-none/mkoffload
as well the target and host compilers are there:
/install/usr/local/bin/x86_64-pc-linux-gnu-gcc
/install/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc
But when I compile a sample code that queries the number of devices with omp_get_num_devices()
, I get the response 0
:
$ /install/usr/local/bin/x86_64-pc-linux-gnu-gcc -fopenmp -foffload=nvptx-none main.c
$ ./a.out
0
When I add -v
(verbose) option to the target compiler's options, I get the following output:
$ /install/usr/local/bin/x86_64-pc-linux-gnu-gcc -fopenmp -foffload=nvptx-none="-v" main.c
Using built-in specs.
COLLECT_GCC=/install/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc
Target: nvptx-none
Configured with: ../../trunk/configure --target=nvptx-none --enable-as-accelerator-for=x86_64-pc-linux-gnu --with-build-time-tools=/usr/local/nvptx-none/bin --disable-sjlj-exceptions --enable-newlib-io-long-long
Thread model: single
gcc version 5.0.0 20150325 (experimental) (GCC)
COLLECT_GCC_OPTIONS='-m64' '-S' '-fmath-errno' '-fsigned-zeros' '-ftrapping-math' '-fno-trapv' '-fno-strict-overflow' '-fno-openacc' '-foffload-abi=lp64' '-fopenmp' '-v' '-v' '-o' '/tmp/cccxIggp.mkoffload'
/install/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0/accel/nvptx-none/lto1 -quiet -dumpbase ccKOW9hi.o -m64 -auxbase-strip /tmp/cccxIggp.mkoffload -version -fmath-errno -fsigned-zeros -ftrapping-math -fno-trapv -fno-strict-overflow -fno-openacc -foffload-abi=lp64 -fopenmp -o /tmp/cccxIggp.mkoffload @/tmp/ccjRDWhp
GNU GIMPLE (GCC) version 5.0.0 20150325 (experimental) (nvptx-none)
compiled by GNU C version 5.0.0 20150325 (experimental), GMP version 5.1.3, MPFR version 3.1.2-p3, MPC version 1.0.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU GIMPLE (GCC) version 5.0.0 20150325 (experimental) (nvptx-none)
compiled by GNU C version 5.0.0 20150325 (experimental), GMP version 5.1.3, MPFR version 3.1.2-p3, MPC version 1.0.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
COMPILER_PATH=/install/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0/accel/nvptx-none/:/install/usr/local/bin/../libexec/gcc/
LIBRARY_PATH=/install/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/5.0.0/accel/nvptx-none/:/install/usr/local/bin/../lib/gcc/
COLLECT_GCC_OPTIONS='-m64' '-S' '-fmath-errno' '-fsigned-zeros' '-ftrapping-math' '-fno-trapv' '-fno-strict-overflow' '-fno-openacc' '-foffload-abi=lp64' '-fopenmp' '-v' '-v' '-o' '/tmp/cccxIggp.mkoffload'
So it looks that the toolchain gets invoked and .mkoffload
files are created.
Please help. If it should work, how can I diagnose what's wrong?
TL;DR - Does GCC (trunk) already support OpenMP 4.0 offloading to nVidia GPU?
No.
Currently GCC supports only OpenMP 4.0 offloading to Intel Xeon Phi (KNL) and OpenACC 2.0 offloading to nVidia GPU.
There are ideas on supporting OpenMP 4.0 offloading to nVidia GPU: [1], [2], but implementation has not yet begun.
UPD 2017: GCC 7.1 now supports OpenMP 4.5 offloading to NVidia GPUs [3].
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With