CUDA Unable to see shared memory values in Nsight debugging

Question

I've been struggling for some time a problem I can't seem to find a solution to. The problem is that when I try to debug my CUDA code using Nvidia Nsight under Visual Studio 2008 I get strange results when using shared memory.

My code is:

template<typename T>
__device__
T integrate()
{
   extern __shared__ T s_test[]; // Dynamically allocated shared memory
   /**** Breakpoint (1) here ****/
   int index = threadIdx.x + threadIdx.y * blockDim.x; // Local index in block. Column major ordering
   if(index < 64 && blockIdx.x==0) { // Only work on a few values. Just testing
      s_test[index] = (T)index;
      /* Some other irelevant code here */
   }
   return v;
}

When I reach breakpoint 1 and inspect the shared memory inside Visual Studio Watch window only the first 8 values of the array change and the others remain null. I would expect all first 64 to do so. Watch window from Visual Studio

I thought it might have something to do with all warps not executing simultaneously. So I tried synchronizing them. I added this code inside integrate()

template<typename T>
__device__
T integrate()
{
   /* Old code is still here */

   __syncthreads();
   /**** Breakpoint (2) here ****/
   if(index < 64 && blockIdx.x==0) {
      T tmp = s_test[index]; // Write to tmp variable so I can inspect it inside Nsight Watch window
      v = tmp + index; // Use `tmp` and `index` somehow so that the compiler doesn't optimize it out of existence
   }
return v;
}

But the problem is still there. Furthermore the rest of the values inside tmp are not 0 as the Watch window form VS is indicating. Watch window from Nsight

I must mention that it takes a lot of steps to step over __syncthreads(), so when I reach it I just jump to breakpoint 2. What the heck is going on!?

EDIT Information about the system/launch configuration

System

Name Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz
Architecture x86
Frequency 2.666 MHz
Number of Cores 2
Page Size 4.096
Total Physical Memory 3.582,00 MB
Available Physical Memory 1.983,00 MB
Version Name Windows 7 Ultimate
Version Number 6.1.7600

Device GeForce 9500 GT

Driver Version 301.42
Driver Model WDDM
CUDA Device Index 0
GPU Family G96
Compute Capability 1.1
Number of SMs 4
Frame Buffer Physical Size (MB) 512
Frame Buffer Bandwidth (GB/s) 16
Frame Buffer Bus Width (bits) 128
Frame Buffer Location Dedicated
Graphics Clock (Mhz) 812
Memory Clock (Mhz) 500
Processor Clock (Mhz) 1625
RAM Type DDR2

IDE

Microsoft Visual Studio Team System 2008
NVIDIA Nsight Visual Studio Edition, Version 2.2 Build No. 2.2.0.12255

Compiler comands

1> "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin vcc.exe" -G -gencode=arch=compute_10,code=\"sm_10,compute_10\" --machine 32 -ccbin "C:\Program Files\Microsoft Visual Studio 9.0\VC\bin" -D_NEXUS_DEBUG -g -D_DEBUG -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -I"inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\include" -maxrregcount=0 --compile -o "Debug/process_f2f.cu.obj" process_f2f.cu

Launch configuration. The shared memory size and doesn't seem to matter. I've tried several versions. The one I've worked with the most is:

Shared memory 2048 Bytes
Grid/block sizes : {101, 101, 1} , {16, 16, 1}

Younes Nj · Accepted Answer

Have you tried putting __syncthreads() after assigning the values?

template<typename T>
__device__
T integrate()
{
   extern __shared__ T s_test[]; // Dynamically allocated shared memory
   int index = threadIdx.x + threadIdx.y * blockDim.x; // Local index in block. Column major ordering
   if(index < 64 && blockIdx.x==0) { // Only work on a few values. Just testing
      s_test[index] = (T)index;
      /* Some other irelevant code here */
   }
   __syncthreads();
   /**** Breakpoint (1) here ****/
   return v;
}

And try to see the values at this breakpoint.

CUDA Unable to see shared memory values in Nsight debugging

Tags:

debugging

cuda

shared-memory

nsight

Iam

1 Answers

Younes Nj

Recent Activity

Donate For Us

CUDA Unable to see shared memory values in Nsight debugging

Tags:

debugging

cuda

shared-memory

nsight

Iam

1 Answers

Younes Nj

Related questions

Recent Activity

Donate For Us