I'm a beginner of OpenCL for image processing, I use Win7+VS2010+OpenCL2.0+OpenCV247. The platform in my PC is intel i7 CPU + NvidIA GTX760.
Here is my work:
I used opencv to read image(1920*1080) from video, then copy image data and get the data pointer.
uchar* input_data=(uchar*)(gray_image->imageData);
Then I want do some convolution and other image processing works on GPU, so I used OpenCL to upload this data(input_data) to the device memory(cl_input_data) which has been created before. The uploading step takes about 0.2ms, it is fast.
clEnqueueWriteBuffer(queue, cl_input_data, 1,
0, ROI_size*sizeof(cl_uchar), (void*)input_data, 0, 0, NULL);
The main processing works on several kernels, and each of them takes less than 0.1ms which are all quite normal.
clEnqueueNDRangeKernel( queue,kernel_box,2,NULL,global_work_size,local_work_size, 0,NULL, NULL);
After all the processing, I want to download the GPU memory(cl_output_data) to host(output_data), and this step it takes over 5.5ms! Which is nearly 27 times slower than the data uploading step!
clEnqueueReadBuffer( queue,cl_output_data,CL_TRUE,0,ROI_size * sizeof(char),(void*) output_data,0, NULL, NULL );
So, I'm just wondering, since I used the same device and the data size was exactly the same, why the uploading and downloading data's time is so different?
Oh, by the way, the time testing tool I used is something like QueryPerformanceFrequency(&m_Frequency);
Thank you!
As I remember, clEnqueueNDRangeKernel is asynchronous call. It will return control without synchronization with device. So, when you measure time of clEnqueueNDRangeKernel, it is just a time of launch, not of processing. clEnqueueReadBuffer forces device synchronization and waits until all previous kernel call will finish. Thus, your 5.5 ms includes kernels execution time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With