Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data sharing between CPU and GPU on modern x86 hardware with OpenCL or other GPGPU framework

Tags:

gpgpu

opencl

Progressing unification of CPU and GPU hardware, as evidenced by AMD Kaveri with hUMA (heterogeneous Uniform Memory Access) and Intel 4th generation CPUs, should allow copy-free sharing of data between CPU and GPU. I would like to know, if the most recent OpenCL (or other GPGPU framework) implementations allow true copy-free sharing (no explicit or implicit data copying) of large data structure between code running on CPU and GPU.

like image 988
Paul Jurczak Avatar asked Apr 30 '14 02:04

Paul Jurczak


People also ask

Is OpenCL GPU or CPU?

CUDA and OpenCL offer two different interfaces for programming GPUs. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs.

What is OpenCL in GPU?

OpenCL™ (Open Computing Language) is a low-level API for heterogeneous computing that runs on CUDA-powered GPUs. Using the OpenCL API, developers can launch compute kernels written using a limited subset of the C programming language on a GPU.

Can you run OpenCL on CPU?

 OpenCL can use CPUs as a compute device just it can for GPUs.  There is no local memory, CPUs cache is utilized in OpenCL just like any normal CPU program.

Does my graphics card support OpenCL?

All CPUs support OpenCL 1.2 only. NVIDIA: NVIDIA GeForce 8600M GT, GeForce 8800 GT, GeForce 8800 GTS, GeForce 9400M, GeForce 9600M GT, GeForce GT 120, GeForce GT 130, ATI Radeon 4850, Radeon 4870, and likely more are supported.


1 Answers

The ability to share data between host and device without any memory transfers has been available in OpenCL from version 1.0, via the CL_MEM_ALLOC_HOST_PTR flag. This flag allocates a buffer for the device, but ensures that it lies in memory that is also accessible by the host. The workflow for these 'zero-copy' transfers usually takes on this form:

// Allocate a device buffer using host-accessible memory
d_buffer = clCreateBuffer(context, CL_MEM_ALLOC_HOST_PTR, size, NULL, &err);

// Get a host-pointer for the buffer
h_buffer = clEnqueueMapBuffer(queue, d_buffer, CL_TRUE, CL_MAP_WRITE,
                              0, size, 0, NULL, &err);

// Write data into h_buffer from the host
... 

// Unmap the memory buffer
clEnqueueUnmapMemObject(queue, d_buffer, h_buffer, 0, NULL, NULL);

// Do stuff with the buffer on the device
clSetKernelArg(kernel, 0, sizeof(cl_mem), &d_buffer);
clEnqueueNDRangeKernel(queue, kernel, ...);

This will create a device buffer, write some data into it from the host, and then run a kernel using this buffer on the device. Because of the way that the buffer was allocated, this should not result in a memory transfer if the device and host have a unified memory system.


The above approach is limited to simple, flat data structures (1D arrays). If you are interested in working with something a little more complex such as linked-lists, trees or any other pointer-based data structures, you'll need to take advantage of the Shared Virtual Memory (SVM) feature in OpenCL 2.0. At the time of writing, AMD and Intel have both released some preview support for OpenCL 2.0 functionality, but I cannot vouch for their implementations of SVM.

The workflow for the SVM approach will be somewhat similar to the code listed above. In short, you will allocate a buffer using clSVMAlloc, which will return a pointer that is valid on both the host and device. You will use clEnqueueSVMMap and clEnqueueSVMUnmap to synchronise the data when you wish to access the buffer from the host, and clSetKernelArgSVMPointer to pass it to the device. The crucial difference between SVM and CL_MEM_ALLOC_HOST_PTR is that an SVM pointer can also be included inside another buffer passed to the device (e.g. inside a struct or pointed to by another pointer). This is what allows you to build complex pointer-based data structures that can be shared between the host and device.

like image 53
jprice Avatar answered Sep 30 '22 15:09

jprice