Surface reference faster than Surface object

Question

I recently changed the surface reference of my algorithm for a surface object. Then, I noticed that the program runs slower.

Here is a comparison for simple example where I fill a 3D floating array [400*400*400] with a constant value.

Surface reference API

Time: 9.068928 ms

surface<void, cudaSurfaceType3D> s_volumeSurf;
...
surf3Dwrite(value, s_volumeSurf, px*sizeof(float), py, pz, cudaBoundaryModeTrap);

Surface object API

Time: 14.960256 ms

cudaSurfaceObject_t l_volSurfObj;
...
surf3Dwrite(value, l_volSurfObj, px*sizeof(float), py, pz, cudaBoundaryModeTrap);

This was tested on a GTX 680 with Compute Capability 3.0 and CUDA 5.0.

Does anyone have an explanation for this difference?

longlee · Accepted Answer

In the surface object case, surface descriptors are fetched from global memory. In the surface reference case, these descriptors are compiled into constant memory. Fetching these descriptors may be much faster than global memory access. If your kernel is small enough or L1 cache is disabled, you could observe significant performance loss.

You can diff the SASS code to see the difference.

Surface reference faster than Surface object

Tags:

cuda

Surface reference API

Surface object API

Arnaud

Video Answer

1 Answers

longlee

Recent Activity

Donate For Us

Surface reference faster than Surface object

Tags:

cuda

Surface reference API

Surface object API

Arnaud

Video Answer

1 Answers

longlee

Related questions

Recent Activity

Donate For Us