The CUDA driver API provides loading the file containing PTX code from the filesystem. One usually does the following: <pre class="prettyprint"><code>CUmodule module; CUfunction function; const char* module_file = "my_prg.ptx"; const char* kernel_name = "vector_add"; err = cuModuleLoad(&module, module_file); err = cuModuleGetFunction(&function, module, kernel_name); </code></pre> In case one generates the PTX files during runtime (on the fly) going through file IO seems to be a waste (since the driver has to load it back in again). Is there a way to pass the PTX program to the CUDA driver directly (e.g. as a C string) ?

Taken from the <code>ptxjit</code> CUDA example: Define the PTX program as a C string as <pre class="prettyprint"><code>char myPtx32[] = "\n\ .version 1.4\n\ .target sm_10, map_f64_to_f32\n\ .entry _Z8myKernelPi (\n\.param .u32 __cudaparm__Z8myKernelPi_data)\n\ {\n\ .reg .u16 %rh<4>;\n\ .reg .u32 %r<8>;\n\ // Other stuff .loc 28 18 0\n\ exit;\n\ }\n\ "; </code></pre> then <pre class="prettyprint"><code> cuModuleLoadDataEx(phModule, myPtx32, 0, 0, 0); </code></pre> and finally <pre class="prettyprint"><code> cuModuleLoadDataEx(phModule, myPtx, 0, 0, 0); </code></pre>

Passing the PTX program to the CUDA driver directly

Tags:

c

cuda

ptx

The CUDA driver API provides loading the file containing PTX code from the filesystem. One usually does the following:

CUmodule module;
CUfunction function;

const char* module_file = "my_prg.ptx";
const char* kernel_name = "vector_add";

err = cuModuleLoad(&module, module_file);
err = cuModuleGetFunction(&function, module, kernel_name);

In case one generates the PTX files during runtime (on the fly) going through file IO seems to be a waste (since the driver has to load it back in again).

Is there a way to pass the PTX program to the CUDA driver directly (e.g. as a C string) ?

642

asked Apr 05 '13 20:04

ritter

1 Answers

Taken from the ptxjit CUDA example:

Define the PTX program as a C string as

char myPtx32[] = "\n\
    .version 1.4\n\
    .target sm_10, map_f64_to_f32\n\
    .entry _Z8myKernelPi (\n\.param .u32 __cudaparm__Z8myKernelPi_data)\n\
    {\n\
    .reg .u16 %rh<4>;\n\
    .reg .u32 %r<8>;\n\

    // Other stuff

    .loc    28      18      0\n\
    exit;\n\
    }\n\
 ";

then

 cuModuleLoadDataEx(phModule, myPtx32, 0, 0, 0);

and finally

 cuModuleLoadDataEx(phModule, myPtx, 0, 0, 0);

194

answered Sep 22 '22 17:09

Vitality

Related questions
                            
                                getting the position of a user mouse click in C & GLUT
                            
                                C function and variable inside Objective-C class implementation?
                            
                                Computing x^y with GCC vector intrinsics
                            
                                How to print unsigned char* in NSLog()
                            
                                Return into libc - Illegal instruction
                            
                                Verifying self-signed/expired certificate with openssl library does not return error
                            
                                Setting CoS (PCP, 802.1P) in Ethernet frame
                            
                                c - properly allocate memory for a struct containing an array of another struct
                            
                                Inline assembly with intel syntax using LLVM: Unknown token in expression
                            
                                100% cpu usage with a libpcap simple example
                            
                                Add C++ compiler to Eclipse C project
                            
                                Common random number generator of iOS and Android
                            
                                C - "Transport endpoint is not connected" after first recv() call
                            
                                cannot compile mongo-c-driver example
                            
                                Issues with ld and static library "undefined reference to"
                            
                                OpenGL extensions, how to use them correctly in C and glsl
                            
                                Mac OS X: Intercept keyboard layout change
                            
                                Constant out of range with NEON intrinsics
                            
                                Signal handler won't see global variable
                            
                                Options for accessing S3/DynamoDB with C/C++ [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With