We have been trying unsuccessfully to use the new GCC 5.1 release to offload OpenMP blocks to the Intel MIC (i.e. the Xeon Phi). Following the GCC Offloading page, we've put together the build.sh
script to build the "accel" target compiler for "intelmic" and the host compiler. The compilation appears to complete successfully.
Using the env.sh
script we then attempt to compile the simple hello.c
program listed below. However, this program seems to only run on the host and not the target device.
As we are new to offloading in general, as well as compiling GCC, there are multiple things we could be doing incorrectly. However, we've investigated the resources already mentioned plus the following (I do not have enough rep to post the links):
The biggest problem is they usually reference the Intel compiler. While we plan to purchase a copy, we do NOT currently have a copy. In addition, the majority of our development pipeline is already integrated with GCC and we'd prefer to keep it that way (if possible).
We have installed the latest MPSS 3.5 distribution, making the necessary modifications to work under Ubuntu. We can successfully communicate and check the status of the Xeon Phis in our system.
In our efforts, we never saw any indication that the code was running in the mic emulation mode either.
Thanks!
#!/usr/bin/env bash
set -e -x
unset LIBRARY_PATH
GCC_DIST=$PWD/gcc-5.1.0
# Modify these to control where the compilers are installed
TARGET_PREFIX=$HOME/gcc
HOST_PREFIX=$HOME/gcc
TARGET_BUILD=/tmp/gcc-build-mic
HOST_BUILD=/tmp/gcc-build-host
# i dropped the emul since we are not planning to emulate!
TARGET=x86_64-intelmic-linux-gnu
# should this be a quad (i.e. pc)?? default (Ubuntu) build seems to be x86_64-linux-gnu
HOST=x86_64-pc-linux-gnu
# check for the GCC distribution
if [ ! -d $GCC_DIST ]; then
echo "gcc-5.1.0 distribution should be here $PWD"
exit 0
fi
#sudo apt-get install -y libmpfr-dev libgmp-dev libmpc-dev libisl-dev dejagnu autogen sysvbanner
# prepare and configure the target compiler
mkdir -p $TARGET_BUILD
pushd $TARGET_BUILD
$GCC_DIST/configure \
--prefix=$TARGET_PREFIX \
--enable-languages=c,c++,fortran,lto \
--enable-liboffloadmic=target \
--disable-multilib \
--build=$TARGET \
--host=$TARGET \
--target=$TARGET \
--enable-as-accelerator-for=$HOST \
--program-prefix="${TARGET}-"
#--program-prefix="$HOST-accel-$TARGET-" \
# try adding the program prefix as HINTED in the https://gcc.gnu.org/wiki/Offloading
# do we need to specify a sysroot??? Wiki says we don't need one... but it also says "better to configure as cross compiler....
# build and install
make -j48 && make install
popd
# prepare and build the host compiler
mkdir -p $HOST_BUILD
pushd $HOST_BUILD
$GCC_DIST/configure \
--prefix=$HOST_PREFIX \
--enable-languages=c,c++,fortran,lto \
--enable-liboffloadmic=host \
--disable-multilib \
--build=$HOST \
--host=$HOST \
--target=$HOST \
--enable-offload-targets=$TARGET=$TARGET_PREFIX
make -j48 && make install
popd
#!/usr/bin/env bash
TARGET_PREFIX=$HOME/gcc
HOST_PREFIX=$HOME/gcc
HOST=x86_64-pc-linux-gnu
VERSION=5.1.0
export LD_LIBRARY_PATH=/opt/intel/mic/coi/host-linux-release/lib:/opt/mpss/3.4.3/sysroots/k1om-mpss-linux/usr/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$HOST_PREFIX/lib:$HOST_PREFIX/lib64:$HOST_PREFIX/lib/gcc/$HOST/$VERSION:$LD_LIBRARY_PATH
export PATH=$HOST_PREFIX/bin:$PATH
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[])
{
int nthreads, tid;
/* Fork a team of threads giving them their own copies of variables */
#pragma offload target (mic)
{
#pragma omp parallel private(nthreads,tid)
{
/* Obtain thread number */
tid = omp_get_thread_num();
printf("Hello World from thread = %d\n", tid);
/* Only master thread does this */
if (tid == 0) {
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}
#ifdef __MIC__
printf("on target...\n");
#else
printf("on host...\n");
#endif
}
}
}
We compiled this code with:
gcc -fopenmp -foffload=x86_64-intelmic-linux-gnu hello.c -o hello
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[])
{
int nthreads, tid;
/* Fork a team of threads giving them their own copies of variables */
#pragma omp target device(mic)
{
#pragma omp parallel private(nthreads,tid)
{
/* Obtain thread number */
tid = omp_get_thread_num();
printf("Hello World from thread = %d\n", tid);
/* Only master thread does this */
if (tid == 0) {
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}
#ifdef __MIC__
printf("on target...\n");
#else
printf("on host...\n");
#endif
}
}
}
Almost the same thing, but instead we tried the
#pragma omp target device
syntax. In fact, with mic
, it complains, but with any device numbers (i.e. 0) it compiles and runs on the host. This code was compiled in the same manner.
OpenMP GPU offload support in GCC is limited The GCC compiler's OpenMP offload capabilities for GPU code generation is very limited, in terms of both functionality and performance. Users are strongly advised to use LLVM/clang for C/C++ codes, or CCE, which also includes a Fortran compiler with OpenMP offload capability.
Users are strongly advised to use LLVM/clang for C/C++ codes, or CCE, which also includes a Fortran compiler with OpenMP offload capability. Several compilers on the GPU nodes also support GPU offloading with OpenACC directives.
Compiling codes using OpenMP offload capabilities in CCE requires different flags for C and C++ codes than for Fortran codes. The CCE C and C++ compilers are based on clang, and thus use similar flags that one would use for clang to generate OpenMP offload code:
If using clang as a CUDA compiler, one usually will also need to add the -I/path/to/cuda/include and -L/path/to/cuda/lib64 flags manually, since nvcc includes them implicitly. Several compilers have some support for OpenMP offloading to GPUs via the omp target directive. The clang/clang++ LLVM compilers support GPU offloading with OpenMP.
Offloading to Xeon Phi with GCC 5 is possible. In order to get it to work, one must compile liboffloadmic for native MIC target, similarly to how it is done here. The problem of your setup is that it compiles host emulation libraries (libcoi_host.so, libcoi_device.so), and sticks with emulated offloading even though the physical Xeon Phi is present.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With