OK, so I have isolated this down to a very specific problem.
I was under the impression you could pass OpenCL any type of data in an array buffer; ints, chars, your own custom structs, as long as it was all just data and didn't contain pointers to heap objects that the GPU won't be able to retrieve.
Now, I've tried this and I think that it works for a big array of ints, but fails for my array of structs. specifically,
cl_mem log_buffer = clCreateBuffer(context, CL_MEM_READ_WRITE, 
  num_elements * sizeof(int), NULL, NULL);
int* error_codes_in = (int*)malloc(num_elements * sizeof(int));
for (i = 0; i < num_elements; i++) {
  error_codes_in[i] = i;
}
error = clEnqueueWriteBuffer(command_queue, log_buffer, CL_TRUE,
  0, num_elements * sizeof(int), error_codes_in, 0, NULL, NULL);
this works fine, and I get an array of numbers on the GPU and can manipulate them successfully, in parallel.
However, when I am using my own custom struct:
typedef struct {
  float position[2];
  float velocity[2];
  float radius;
  float resultant_force[2];
} ocl_element_2d_t;
(also defined in the kernel, as)
const char* kernel_string = 
  "typedef struct { float position[2]; float velocity[2]; float radius; float resultant_force[2]; } ocl_element_2d_t;"...
and I use the same/very similar code to write to the GPU version of my struct array:
cl_mem gpu_buffer = clCreateBuffer(context, CL_MEM_READ_WRITE,
  num_elements * sizeof(ocl_element_2d_t), NULL, NULL);
error = clEnqueueWriteBuffer(command_queue, (cl_mem)gpu_buffer, CL_TRUE,
  0, num_elements * sizeof(ocl_element_2d_t), host_buffer, 0, NULL, NULL);
I get blank values in the GPU, and occasionally garbage (three or four values in 350,) for all of the float values inside the struct. Both return values are CL_SUCCESS.
Any suggestions as to where I'm going wrong? My only thought is that the GPU compiler produces a struct in memory with different gaps, and since the copy method ignores the internal structure of the items and just copies a continguous block of RAM, you end up with mismatches and possible out of phase items. Is it possible that my OS is 64-bit (OS X Lion) on an i7 (quad core), and my GPU is running 32-bit, and this is the problem? It's an ATI Radeon HD 5750, which has no double precision support, and claims to have a 128-bit bus (which may or may not be relevant, I don't know precisely what this stuff means.)
Is there a correct way to do this? Am I going to have to go all FORTRAN and have 7 different arrays, each with their own kernel argument, for the different properties in the struct?
All credit to @0A0D for being suspicious of my selective code samples. The problem was indeed in my failure to initialise the structs correctly.
My excuse is simply that I'm used to working with struct pointers, not structs, and so writing
ocl_element_2d_t element = host_buffer[i];
element.position[0] = 1.2;
element.position[1] = 5.7;
was the standard way to add properties to an object. Having had a quick google of structs, I came across a very very basic C tutorial, http://www.asic-world.com/scripting/structs_c.html which pointed out that
struct_instance = other_struct_instance;
performs a deep copy, not a reference copy.
Thus, when I tested the output from the local struct variable, the value I was expecting was there, and yet still nowhere near the array in host_buffer.
There are probably two lessons here:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With