Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA Unable to see shared memory values in Nsight debugging

I've been struggling for some time a problem I can't seem to find a solution to. The problem is that when I try to debug my CUDA code using Nvidia Nsight under Visual Studio 2008 I get strange results when using shared memory.

My code is:

template<typename T>
__device__
T integrate()
{
   extern __shared__ T s_test[]; // Dynamically allocated shared memory
   /**** Breakpoint (1) here ****/
   int index = threadIdx.x + threadIdx.y * blockDim.x; // Local index in block. Column major ordering
   if(index < 64 && blockIdx.x==0) { // Only work on a few values. Just testing
      s_test[index] = (T)index;
      /* Some other irelevant code here */
   }
   return v;
}

When I reach breakpoint 1 and inspect the shared memory inside Visual Studio Watch window only the first 8 values of the array change and the others remain null. I would expect all first 64 to do so. Watch window from Visual Studio

I thought it might have something to do with all warps not executing simultaneously. So I tried synchronizing them. I added this code inside integrate()

template<typename T>
__device__
T integrate()
{
   /* Old code is still here */

   __syncthreads();
   /**** Breakpoint (2) here ****/
   if(index < 64 && blockIdx.x==0) {
      T tmp = s_test[index]; // Write to tmp variable so I can inspect it inside Nsight Watch window
      v = tmp + index; // Use `tmp` and `index` somehow so that the compiler doesn't optimize it out of existence
   }
return v;
}

But the problem is still there. Furthermore the rest of the values inside tmp are not 0 as the Watch window form VS is indicating. Watch window from Nsight

I must mention that it takes a lot of steps to step over __syncthreads(), so when I reach it I just jump to breakpoint 2. What the heck is going on!?


EDIT Information about the system/launch configuration

System

  • Name Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz
  • Architecture x86
  • Frequency 2.666 MHz
  • Number of Cores 2
  • Page Size 4.096
  • Total Physical Memory 3.582,00 MB
  • Available Physical Memory 1.983,00 MB
  • Version Name Windows 7 Ultimate
  • Version Number 6.1.7600

Device GeForce 9500 GT

  • Driver Version 301.42
  • Driver Model WDDM
  • CUDA Device Index 0
  • GPU Family G96
  • Compute Capability 1.1
  • Number of SMs 4
  • Frame Buffer Physical Size (MB) 512
  • Frame Buffer Bandwidth (GB/s) 16
  • Frame Buffer Bus Width (bits) 128
  • Frame Buffer Location Dedicated
  • Graphics Clock (Mhz) 812
  • Memory Clock (Mhz) 500
  • Processor Clock (Mhz) 1625
  • RAM Type DDR2

IDE

  • Microsoft Visual Studio Team System 2008
  • NVIDIA Nsight Visual Studio Edition, Version 2.2 Build No. 2.2.0.12255

Compiler comands

1> "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin\nvcc.exe" -G -gencode=arch=compute_10,code=\"sm_10,compute_10\" --machine 32 -ccbin "C:\Program Files\Microsoft Visual Studio 9.0\VC\bin" -D_NEXUS_DEBUG -g -D_DEBUG -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -I"inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\include" -maxrregcount=0 --compile -o "Debug/process_f2f.cu.obj" process_f2f.cu

Launch configuration. The shared memory size and doesn't seem to matter. I've tried several versions. The one I've worked with the most is:

  • Shared memory 2048 Bytes
  • Grid/block sizes : {101, 101, 1} , {16, 16, 1}
like image 589
Iam Avatar asked Nov 12 '22 21:11

Iam


1 Answers

Have you tried putting __syncthreads() after assigning the values?

template<typename T>
__device__
T integrate()
{
   extern __shared__ T s_test[]; // Dynamically allocated shared memory
   int index = threadIdx.x + threadIdx.y * blockDim.x; // Local index in block. Column major ordering
   if(index < 64 && blockIdx.x==0) { // Only work on a few values. Just testing
      s_test[index] = (T)index;
      /* Some other irelevant code here */
   }
   __syncthreads();
   /**** Breakpoint (1) here ****/
   return v;
}

And try to see the values at this breakpoint.

like image 68
Younes Nj Avatar answered Dec 14 '22 23:12

Younes Nj