I'm following the instructions on this SO answer but when I try to run the resulting PTX file I get the follow error in clBuild
ptxas fatal : Unresolved extern function 'get_group_id'
In the PTX file I have the following for every OpenCL function call I use
.func (.param .b64 func_retval0) get_group_id
(
.param .b32 get_group_id_param_0
)
;
The above isn't present in the PTX files created by the OpenCL runtime when I provide it with a CL file. Instead it has the proper special register.
Following these instructions (links against a different libclc library) gives me a segmentation fault during the LLVM IR to PTX compilation with the following error:
fatal error: error in backend: Cannot cast between two non-generic address spaces
Are those instructions still valid? Is there something else I should be doing?
I'm using the latest version of libclc, Clang 3.7, and Nvidia driver 352.39
The problem is that llvm does not provide an OpenCL device code library. llvm however provides the intrinsics for getting the IDs of a GPU thread. Now you have to write your own implantations of get_global_id
etc. using clang's builtins and compile it to llvm bitcode with the nvptx target. Before you lower your IR to PTX you use llvm-link
to link your device library with your compiled OpenCL module and that's it.
A example how you would write such a function:
#define __ptx_mad(a,b,c) ((a)*(b)+(c))
__attribute__((always_inline)) unsigned int get_global_id(unsigned int dimindx) {
switch (dimindx) {
case 0: return __ptx_mad(__nvvm_read_ptx_sreg_ntid_x(), __nvvm_read_ptx_sreg_ctaid_x(), __nvvm_read_ptx_sreg_tid_x());
case 1: return __ptx_mad(__nvvm_read_ptx_sreg_ntid_y(), __nvvm_read_ptx_sreg_ctaid_y(), __nvvm_read_ptx_sreg_tid_y());
case 2: return __ptx_mad(__nvvm_read_ptx_sreg_ntid_z(), __nvvm_read_ptx_sreg_ctaid_z(), __nvvm_read_ptx_sreg_tid_z());
default: return 0;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With