I'm working on translating a CUDA application (this if you must know) to OpenCL. The original application uses the C-style CUDA API, with a single stream just to avoid the automatic busy-wait when reading the results.
Now I notice that OpenCL command queues look a lot like CUDA streams. But in the device read command, and likewise in the write and kernel execute commands, I notice parameters for events too. So I'm wondering, what does it take to execute a device write, a number of kernels (e.g. one call to one kernel then 100 calls to another kernel), and a device read, all sequentially?
Thanks!
GPU Workloads and the OpenCL™ Command Queue The host program defines the context for the kernels and manages their execution. Objects such as memory, programs and kernel are created within an OpenCL context. The host creates a data structure called a command-queue to coordinate execution of the kernels on the devices.
A command-queue is used to queue commands to a device. Examples of commands include executing kernels, or reading and writing memory objects. OpenCL devices typically correspond to a GPU, a multi-core CPU, and other processors such as DSPs and the Cell/B.E. processor.
That depends on how you create the command queue. in clCreateCommandQueue there's a properties parameter that can contain CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, which enables non-sequential execution in the command queue.
If that property is set, commands might execute out of order or in parallel, and the only way of synchronize them is using events.
When that property is not set, commands execute sequentially in the queue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With