Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

no CUDA-capable device is detected (using ubuntu 12.04.4 server) [closed]

Tags:

linux

cuda

I recently installed the cuda toolkit 5.5 with driver 331.67 (I have a GeForce GTX 680). For some reason, I cannot run any of the test scrips:

$./NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery/deviceQuery 
./NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

I followed the steps on the "getting started guide" here

http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/

and made a script to create the character device files at startup (as I am running the server edition of Ubuntu such graphics files aren't created by default):

$ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Apr 11 17:29 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Apr 11 17:29 /dev/nvidiactl

The output for executing the command nvidia-smi -a is (for both normal user and root user):

Failed to initialize NVML: Unknown Error

Here is some info on the nvidia module

$ lsmod | grep nvidia
nvidia              11335080  0 
$ modinfo nvidia
filename:       /lib/modules/3.11.0-17-generic/updates/dkms/nvidia.ko
alias:          char-major-195-*
version:        331.67
supported:      external
license:        NVIDIA
...
...

Any suggestions ? Thanks.

EDIT #1 I tried downgrading to driver 319.76:

$ modinfo nvidia
filename:       /lib/modules/3.11.0-17-generic/updates/dkms/nvidia.ko
alias:          char-major-195-*
version:        319.76
supported:      external
...

Now when I run nvidia-smi -a I get the following:

NVIDIA: API mismatch: the NVIDIA kernel module has version 304.116,
but this NVIDIA driver component has version 319.76.  Please make
sure that the kernel module and all NVIDIA driver components
have the same version.
Failed to initialize NVML: Unknown Error

I installed the nvidia-current-updates and nvidia-settings-updates packages from the repos before installing the driver file and I guess that's where the conflicting arose. I have not found a solution, but this is one step closer I think. Here is the result of modprobe -l | grep nvidia

kernel/drivers/video/nvidia/nvidiafb.ko
kernel/drivers/net/ethernet/nvidia/forcedeth.ko
updates/dkms/nvidia.ko
updates/dkms/nvidia_304_updates.k
like image 778
dermen Avatar asked Apr 14 '14 18:04

dermen


1 Answers

So it turns out the main error I was encountering was due to the fact that there was a version mismatch between the nvidia kernel module and the driver component. Here are the steps I took which helped me find a resolution.

1) downgrading the driver allowed me to see nvidia-smi -a complain about a driver component mismatch. I wasn't sure this would be a problem originally. I was simply following a CUDA toolkit setup guide, which didn't mention this being a problem.

2) Having installed the kernel modules from the repos, I just picked the corresponding driver component with correct version. If you don't know the version of your installed kernel module you can use modprobe and modinfo. For example, on my system

$ modprobe -l | grep nvidia
kernel/drivers/video/nvidia/nvidiafb.ko
kernel/drivers/net/ethernet/nvidia/forcedeth.ko
updates/dkms/nvidia.ko
updates/dkms/nvidia_304_updates.ko

The module nvidia_304_updates was installed from the repos (package nvidia-updates-current). Its exact version is found with modinfo

$ modinfo /lib/modules/3.11.0-17-generic/updates/dkms/nvidia_304_updates.ko 
filename:       /lib/modules/3.11.0-17-generic/updates/dkms/nvidia_304_updates.ko
alias:          char-major-195-*
version:        304.116
supported:      external

After downloading and installing the corresponding driver component from the archive on the nvidia website,

http://www.nvidia.com/Download/Find.aspx?lang=en-us

, I was able to run the command

$ nvidia-smi -a

==============NVSMI LOG==============

Timestamp                       : Mon Apr 14 15:17:44 2014
Driver Version                  : 304.116

Attached GPUs                   : 1
GPU 0000:04:00.0
    Product Name                : GeForce GTX 680
...
...

And the original script I was trying to execute

$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 680"
  CUDA Driver Version / Runtime Version          5.0 / 5.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2047 MBytes (2146762752 bytes)
  ( 8) Multiprocessors x (192) CUDA Cores/MP:    1536 CUDA Cores
  ...
  ...
like image 114
dermen Avatar answered Nov 04 '22 14:11

dermen