Why the data downloading is much slower than the uploading on GPU by using OpenCL?

Question

I'm a beginner of OpenCL for image processing, I use Win7+VS2010+OpenCL2.0+OpenCV247. The platform in my PC is intel i7 CPU + NvidIA GTX760.

Here is my work:

I used opencv to read image(1920*1080) from video, then copy image data and get the data pointer.
```
uchar* input_data=(uchar*)(gray_image->imageData);
```
Then I want do some convolution and other image processing works on GPU, so I used OpenCL to upload this data(input_data) to the device memory(cl_input_data) which has been created before. The uploading step takes about 0.2ms, it is fast.
```
clEnqueueWriteBuffer(queue, cl_input_data, 1,
    0, ROI_size*sizeof(cl_uchar), (void*)input_data, 0, 0, NULL);
```
The main processing works on several kernels, and each of them takes less than 0.1ms which are all quite normal.
```
clEnqueueNDRangeKernel( queue,kernel_box,2,NULL,global_work_size,local_work_size, 0,NULL, NULL);
```
After all the processing, I want to download the GPU memory(cl_output_data) to host(output_data), and this step it takes over 5.5ms! Which is nearly 27 times slower than the data uploading step!
```
clEnqueueReadBuffer( queue,cl_output_data,CL_TRUE,0,ROI_size * sizeof(char),(void*) output_data,0, NULL, NULL );
```

So, I'm just wondering, since I used the same device and the data size was exactly the same, why the uploading and downloading data's time is so different?

Oh, by the way, the time testing tool I used is something like QueryPerformanceFrequency(&m_Frequency);

Thank you!

jet47 · Accepted Answer

As I remember, clEnqueueNDRangeKernel is asynchronous call. It will return control without synchronization with device. So, when you measure time of clEnqueueNDRangeKernel, it is just a time of launch, not of processing. clEnqueueReadBuffer forces device synchronization and waits until all previous kernel call will finish. Thus, your 5.5 ms includes kernels execution time.

Why the data downloading is much slower than the uploading on GPU by using OpenCL?

Tags:

c++

opencv

gpu

opencl

David Ding

1 Answers

jet47

Recent Activity

Donate For Us

Why the data downloading is much slower than the uploading on GPU by using OpenCL?

Tags:

c++

opencv

gpu

opencl

David Ding

1 Answers

jet47

Related questions

Recent Activity

Donate For Us