I have always used cuda-memcheck under Windows 7.
Unfortunately, on my laptop I'm now getting the following error message:
========= Internal Memcheck Error: Memcheck failed initialization as profiler is attached. Try unsetting CUDA_PROFILE or disabling the profiler.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:C:\windows\system32\nvcuda.dll (cuD3D11CtxCreate + 0x103dbd) [0x11fe1d]
=========     Host Frame:C:\Users\user\Documents\Project\StackOverflow\Debug\cudart32_55.dll (_cudaRegisterDeviceFunction + 0x5eb2) [0xdaf2]
=========     Host Frame:C:\Users\user\Documents\Project\StackOverflow\Debug\cudart32_55.dll (_cudaRegisterDeviceFunction + 0x600d) [0xdc4d]
=========     Host Frame:C:\Users\user\Documents\Project\StackOverflow\Debug\cudart32_55.dll (_cudaRegisterDeviceFunction + 0x6576) [0xe1b6]
=========     Host Frame:C:\Users\user\Documents\Project\StackOverflow\Debug\cudart32_55.dll (_cudaRegisterDeviceFunction + 0x3609) [0xb249]
=========     Host Frame:C:\Users\user\Documents\Project\StackOverflow\Debug\cudart32_55.dll [0x3137]
=========     Host Frame:C:\Users\user\Documents\Project\StackOverflow\Debug\cudart32_55.dll (cudaMalloc + 0xb5) [0x152d5]
=========     Host Frame:C:\Users\user\Documents\Project\StackOverflow\Debug\StackOverflow.exe (main + 0x59) [0x2289]
=========     Host Frame:C:\Users\user\Documents\Project\StackOverflow\Debug\StackOverflow.exe (__tmainCRTStartup + 0x1bf) [0xa3ef]
=========     Host Frame:C:\Users\user\Documents\Project\StackOverflow\Debug\StackOverflow.exe (mainCRTStartup + 0xf) [0xa21f]
=========     Host Frame:C:\windows\syswow64\KERNEL32.dll (BaseThreadInitThunk + 0x12) [0x1336a]
=========     Host Frame:C:\windows\SysWOW64\ntdll.dll (RtlInitializeExceptionChain + 0x63) [0x39f72]
=========     Host Frame:C:\windows\SysWOW64\ntdll.dll (RtlInitializeExceptionChain + 0x36) [0x39f45]
=========
========= ERROR SUMMARY: 1 error
I have checked about the existence of a CUDA_PROFILE environmental variable, but it is not defined neither as a system nor as a user variable. I have anyway set 
Set @CUDA_PROFILE = 0
but with no effect. I'm using CUDA 5.5.
I have tried using cuda-memcheck also on other two systems, a 4-GPU, NVIDIA K20c system and a system having a single Tesla C2050 card. On the former I have the same problem, on the latter cuda-memcheck works fine.
The fact that the error says the profile is attached makes me think that the problem could be due to a Visual Studio attachments to process that I have previously done on the two machines cuda-memcheck is not working on. The machine cuda-memcheck is working on has been freshly installed, instead. However, I have checked that NSIGHT_CUDA_DEBUGGER environmental variable used for such a kind of attachments is set to 0. Also. I couldn't spot any appearent process that could still be attached to the debugger.
Could anyone suggest any hint to solve the problem?
I ran into a similar with CUDA 6.5 and 7.0. The error message was slightly more general (which may be due to the different version - I'm not sure about that). It said
Internal Memcheck Error: Memcheck failed initialization as some other tools is currently attached. Please make sure that nvprof and Nsight Visual Studio Edition are not being run simultaneously
(Of course, there was no other tool running at this time).
Setting the COMPUTE_PROFILE environment variable to 0 did not help. (Actually, it was not set at all for me in the first place).
Finally, I figured out that the odd behavior was caused by other environment variables: The toolkit/profiler obviously sets two additional environment variables during the installation:
CUDA_INJECTION32_PATH=C:\Program Files (x86)\NVIDIA Corporation\Nsight Visual Studio Edition 4.1\Monitor\Common\Injection32\Nvda.Cuda.Injection.dll
CUDA_INJECTION64_PATH=C:\Program Files (x86)\NVIDIA Corporation\Nsight Visual Studio Edition 4.1\Monitor\Common\Injection64\Nvda.Cuda.Injection.dll
I'm not sure what they are doing (most likely, they establish some "hook" that is required for profiling). In any case: Removing these environment variables (or setting them to be empty by executing
set CUDA_INJECTION32_PATH=
set CUDA_INJECTION64_PATH=
at the command prompt where cuda-memcheck should be launched) caused cuda-memcheck to work properly again. 
UPDATE
I was getting the same error on my Windows 2008 R2 and Windows 7 machines with Geforce GTX 780 GPUs. Though the procedure explained above worked for me, I found out that CUDA_INJECTION32_PATH and CUDA_INJECTION64_PATH environment variables are added and set by Nsight Monitor when its setting "CUDA->Use this monitor for CUDA attach" is to true by the user.
In order to fix the initialization problem of cuda-memcheck, I simply turned off the setting "CUDA->Use this monitor for CUDA attach" in Nsight monitor. This deleted the CUDA_INJECTION32_PATH and CUDA_INJECTION64_PATH environment variables. Thereafter I opened a new session of command prompt to reload new environment variables and tested that cuda-memcheck worked properly.
When experiencing the problem with cuda-memcheck, my system environmental variable COMPUTE_PROFILE was set to 1. I had just to set it to 0 to have cuda-memcheck work correctly. Incidentally, I have to thank @Vjas for suggesting checking nvprof --profile-all-processeswhich complained about the setting of CUDA_PROFILE. I have solved the problem on my laptop and on the Kepler system by setting COMPUTE_PROFILE=0. 
I have today no access to the Tesla system, on which cuda-memcheck was properly working, to check about the setting of COMPUTE_PROFILE. I will update this answer as soon as I have such an information.
EDIT
I have checked that the environmental variable COMPUTE_PROFILE was not defined on the system where cuda-memcheck was originally working.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With