I have created following structure 'data' in C <pre class="prettyprint"><code>typedef struct data { double *dattr; int d_id; int bestCent; }Data; </code></pre> The 'dattr' is an array in above structure which is kept dynamic. Suppose I have to create 10 objects of above structure. i.e. <pre class="prettyprint"><code>dataNode = (Data *)malloc (sizeof(Data) * 10); </code></pre> and for every object of this structure I have to reallocate the memory in C for array 'dattr' using: <pre class="prettyprint"><code>for(i=0; i<10; i++) dataNode[i].dattr = (double *)malloc(sizeof(double) * 3); </code></pre> What should do to implement the same in OpenCL? How to allocate the memory for array 'dattr' once I allocate the memory for structure objects?

Memory allocation in OpenCL devices (for example, a GPU) must be performed in the host thread using clCreateBuffer (or clCreateImage2D/3D if you wish to use texture memory). These functions allow you automatically copy host data (created with malloc for example) to the device, but I usually prefer to explicitly use clEnqueueWriteBuffer/clEnqueueMapBuffer (or clEnqueueWriteImage/clEnqueueMapImage if using texture memory), so that I can profile the data transfers. Here's an example: <pre class="prettyprint"><code>#define DATA_SIZE 1000 typedef struct data { cl_uint id; cl_uint x; cl_uint y; } Data; ... // Allocate data array in host size_t dataSizeInBytes = DATA_SIZE * sizeof(Data); DATA * dataArrayHost = (DATA *) malloc(dataSizeInBytes); // Initialize data ... // Create data array in device cl_mem dataArrayDevice = clCreateBuffer(context, CL_MEM_READ_ONLY, dataSizeInBytes, NULL, &status ); // Copy data array to device status = clEnqueueWriteBuffer(queue, dataArrayDevice, CL_TRUE, 0, dataSizeInBytes, &dataArrayHost, 0, NULL, NULL ); // Make sure to pass dataArrayDevice as kernel parameter // Run kernel ... </code></pre> What you need to consider is that you need to know the memory requirements of an OpenCL kernel before you execute it. As such memory allocation can be dynamic if performed before kernel execution (i.e. in host). Nothing stops you from calling the kernel several times, and in each of those times adjusting (allocating) the kernel memory requirements. Having this into account, I advise you to rethink the way your approaching the problem. To begin, it is simpler (but not necessarily more efficient) to work with arrays of structures, than with structures of arrays (in which case, the arrays would have to have a fixed size anyway). This is just to give you an idea of how OpenCL works. Take a look at Khronos OpenCL resource page, it has plenty of OpenCL tutorials and examples, and Khronos OpenCL page, which has the official OpenCL references, man pages and quick references cards.

Memory object allocation in Opencl for dynamic array in structure

Tags:

gpgpu

gpu

opencl

I have created following structure 'data' in C

typedef struct data
{
  double *dattr;                           
  int d_id;                                
  int bestCent;                            
}Data;

The 'dattr' is an array in above structure which is kept dynamic. Suppose I have to create 10 objects of above structure. i.e.

dataNode = (Data *)malloc (sizeof(Data) * 10);

and for every object of this structure I have to reallocate the memory in C for array 'dattr' using:

for(i=0; i<10; i++)
   dataNode[i].dattr = (double *)malloc(sizeof(double) * 3);

What should do to implement the same in OpenCL? How to allocate the memory for array 'dattr' once I allocate the memory for structure objects?

986

asked Jan 10 '13 13:01

sandeep.ganage

1 Answers

Memory allocation in OpenCL devices (for example, a GPU) must be performed in the host thread using clCreateBuffer (or clCreateImage2D/3D if you wish to use texture memory). These functions allow you automatically copy host data (created with malloc for example) to the device, but I usually prefer to explicitly use clEnqueueWriteBuffer/clEnqueueMapBuffer (or clEnqueueWriteImage/clEnqueueMapImage if using texture memory), so that I can profile the data transfers. Here's an example:

#define DATA_SIZE 1000

typedef struct data {
    cl_uint id;
    cl_uint x;
    cl_uint y;
} Data;

...

// Allocate data array in host
size_t dataSizeInBytes = DATA_SIZE * sizeof(Data);
DATA * dataArrayHost = (DATA *) malloc(dataSizeInBytes);

// Initialize data
...

// Create data array in device
cl_mem dataArrayDevice = clCreateBuffer(context, CL_MEM_READ_ONLY, dataSizeInBytes, NULL, &status );

// Copy data array to device
status = clEnqueueWriteBuffer(queue, dataArrayDevice, CL_TRUE, 0, dataSizeInBytes, &dataArrayHost, 0, NULL, NULL );

// Make sure to pass dataArrayDevice as kernel parameter
// Run kernel
...

What you need to consider is that you need to know the memory requirements of an OpenCL kernel before you execute it. As such memory allocation can be dynamic if performed before kernel execution (i.e. in host). Nothing stops you from calling the kernel several times, and in each of those times adjusting (allocating) the kernel memory requirements.

Having this into account, I advise you to rethink the way your approaching the problem. To begin, it is simpler (but not necessarily more efficient) to work with arrays of structures, than with structures of arrays (in which case, the arrays would have to have a fixed size anyway).

This is just to give you an idea of how OpenCL works. Take a look at Khronos OpenCL resource page, it has plenty of OpenCL tutorials and examples, and Khronos OpenCL page, which has the official OpenCL references, man pages and quick references cards.

126

answered Nov 15 '22 09:11

faken

Related questions
                            
                                Work-items, Work-groups and Command Queues organization and memory limit in OpenCL
                            
                                What is the point of the built-in functions isequal, isnotequal, isgreater, etc.?
                            
                                What is the best way to implement a small lookup table in an OpenCL Kernel
                            
                                Numerical Integration - How to parallelize it?
                            
                                OpenCL: 32-bit and 64-bit popcnt instruction on GPU?
                            
                                Offloading coordinate transformations to GPU
                            
                                OpenCL Mac OS compile from command line, openclc command not found
                            
                                How does the opencl command queue work, and what can I ask of it
                            
                                Can this OpenCL code be optimized?
                            
                                OpenCL pass by reference different addres space
                            
                                An OpenCL code in MQL5 does not get distributed jobs to each GPU core
                            
                                OpenCL distribution
                            
                                Array size and copy performance
                            
                                Why OpenCL doesn't have matrix data type?
                            
                                OpenCL Theano - How to forcefully disable CUDA?
                            
                                Erlang bindings for CUDA or OpenCL
                            
                                OpenCL: Running CPU/GPU multiple devices
                            
                                libpng png_set_add_alpha | png_set_filler error: sequential row overflow
                            
                                OpenCL kernel execution does not start until clFinish or clWaitForEvents is called
                            
                                Is there a way to unroll loops in an AMD OpenCL kernel with the compiler?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With