Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Surface reference faster than Surface object

Tags:

cuda

I recently changed the surface reference of my algorithm for a surface object. Then, I noticed that the program runs slower.

Here is a comparison for simple example where I fill a 3D floating array [400*400*400] with a constant value.

Surface reference API

Time: 9.068928 ms

surface<void, cudaSurfaceType3D> s_volumeSurf;
...
surf3Dwrite(value, s_volumeSurf, px*sizeof(float), py, pz, cudaBoundaryModeTrap);

Surface object API

Time: 14.960256 ms

cudaSurfaceObject_t l_volSurfObj;
...
surf3Dwrite(value, l_volSurfObj, px*sizeof(float), py, pz, cudaBoundaryModeTrap);

This was tested on a GTX 680 with Compute Capability 3.0 and CUDA 5.0.

Does anyone have an explanation for this difference?

like image 957
Arnaud Avatar asked May 27 '13 07:05

Arnaud


Video Answer


1 Answers

In the surface object case, surface descriptors are fetched from global memory. In the surface reference case, these descriptors are compiled into constant memory. Fetching these descriptors may be much faster than global memory access. If your kernel is small enough or L1 cache is disabled, you could observe significant performance loss.

You can diff the SASS code to see the difference.

like image 169
longlee Avatar answered Oct 03 '22 00:10

longlee