Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenGL-OpenCL interop transfer times + texturing from bitmap

Tags:

opengl

opencl

Two part question:

I'm working on a school project using the game of life as a vehicle to experiment with gpgpu. I'm using OpenCL and OpenGL for realtime visualizations and the goal is to get this thing as big and fast as possible. Upon profiling I find that the frame time is dominated by CL Acquiring and Releasing the GL buffers, and that the time cost is directly proportional to the actual size of the buffer.

1) Is this normal? Why should this be? To the best of my understanding, the buffer never leaves device memory, and the CL Acquire/Release acts like a mutex. Does OpenCL lock/unlock each byte individually or something?

To get around this I've shrunk from 24-bit RGBA color mode (OpenGL's preferred color mode as I understand it?) to 8-bit RGB color. This has resulted in a major speedup, but after tuning my kernel, the transfer times are dominating again.

In the absence of any ideas on how to eliminate the transfer times entirely (short of porting my kernel from OpenCL to GLSL, which would exceed the original scope of the project), I now figure that my best bet is to write to a bitmap (as opposed to the 8-bit pixmap I'm currently using) and then use that bitmap with a color index to texture a quad.

2) Can I texture a quad directly using a bitmap? I have considered using glBitmap to draw to an auxiliary buffer, and then using this buffer to texture my quad, but I would prefer to use a more direct route if one is available.

like image 978
evenex_code Avatar asked Dec 05 '12 05:12

evenex_code


1 Answers

The design intent behind the CL/GL interop acquire and release calls was for them to be simply ownership transfers. However, in many early implementations these were doing copies of the images from CL to GL and back.

Unless you use the sync object extensions in OpenCL 1.1, you need to clFinish before you release and glFinish before you acquire; you will see a lot of time spent here because all queued work will have to finish before these calls continue. Some platforms you can use clFlush instead of clFinish; check the OpenCL documentation from your vendor.

With the latest NVIDIA and AMD drivers on more or less recent hardware, I'm seeing the acquire and release calls going pretty quickly for HD video sized images.

like image 110
Dithermaster Avatar answered Oct 26 '22 23:10

Dithermaster