Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

drm.ko missing for CUDA 6.5 / Ubuntu 14.04 / AWS EC2 GPU instance g2.2xlarge

To install CUDA 6.5 on Ubuntu 14.04.1 LTS on AWS EC2 g2.2xlarge instance, whether I install via the .deb file or .run file

.sudo ./cuda_6.5.14_linux_64.run --kernel-source-path=/usr/src/linux-headers-3.13.0-34-generic

I always get the same error about a missing drm.ko. The code compilation seems successful. Below was the log. (I rebooted before installing)

Kernel module compilation complete.

Unable to determine if Secure Boot is enabled: No such file or directory

Kernel module load error: No such file or directory

Kernel messages:

[ 3.595939] type=1400 audit(1408809902.911:5): apparmor="STATUS"

operation="profile_replace" profile="unconfined"

name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=492

comm="apparmor_parser"

[ 3.595942] type=1400 audit(1408809902.911:6): apparmor="STATUS"

operation="profile_replace" profile="unconfined"

name="/usr/lib/connman/scripts/dhclient-script" pid=492

comm="apparmor_parser"

[ 3.596140] type=1400 audit(1408809902.915:7): apparmor="STATUS"

operation="profile_replace" profile="unconfined"

operation="profile_replace" profile="unconfined"

name="/usr/lib/connman/scripts/dhclient-script" pid=492

comm="apparmor_parser"

[ 4.696067] init: failsafe main process (833) killed by TERM signal

[ 4.793261] type=1400 audit(1408809904.107:8): apparmor="STATUS"

operation="profile_replace" profile="unconfined" name="/sbin/dhclient"

pid=952 comm="apparmor_parser"

[ 4.793267] type=1400 audit(1408809904.107:9): apparmor="STATUS"

operation="profile_replace" profile="unconfined"

name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=952

comm="apparmor_parser"

[ 5.036249] init: plymouth-upstart-bridge main process ended, respawning

[ 6.589233] init: udev-fallback-graphics main process (1203) terminated

with status 1

[ 136.367014] nvidia: module license 'NVIDIA' taints kernel.

[ 136.367019] Disabling lock debugging due to kernel taint

[ 136.370281] nvidia: module verification failed: signature and/or

required key missing - tainting kernel

[ 136.370383] nvidia: Unknown symbol drm_open (err 0)

[ 136.370393] nvidia: Unknown symbol drm_poll (err 0)

[ 136.370404] nvidia: Unknown symbol drm_pci_init (err 0)

[ 136.370449] nvidia: Unknown symbol drm_gem_prime_handle_to_fd (err 0)

[ 136.370462] nvidia: Unknown symbol drm_gem_private_object_init (err 0)

[ 136.370474] nvidia: Unknown symbol drm_gem_mmap (err 0)

[ 136.370478] nvidia: Unknown symbol drm_ioctl (err 0)

[ 136.370486] nvidia: Unknown symbol drm_gem_object_free (err 0)

[ 136.370496] nvidia: Unknown symbol drm_read (err 0)

[ 136.370509] nvidia: Unknown symbol drm_gem_handle_create (err 0)

[ 136.370515] nvidia: Unknown symbol drm_prime_pages_to_sg (err 0)

[ 136.370550] nvidia: Unknown symbol drm_pci_exit (err 0)

[ 136.370563] nvidia: Unknown symbol drm_release (err 0)

[ 136.370565] nvidia: Unknown symbol drm_gem_prime_export (err 0)

The driver installation is unable to locate the kernel source. Please make sure that the kernel source packages are installed and set up correctly.

like image 440
Jen Avatar asked Aug 23 '14 16:08

Jen


People also ask

How to install multiple versions of CUDA on a Linux instance?

After you install an NVIDIA graphics driver on your instance, you can install a version of CUDA other than the version that is bundled with the graphics driver. The following procedure demonstrates how to configure multiple versions of CUDA on the instance. Connect to your Linux instance.

How do I install NVIDIA CUDA on Ubuntu?

Open the NVIDIA website and select the version of CUDA that you need. Select the architecture, distribution, and version for the operating system on your instance. For Installer Type, select runfile (local) . Follow the instructions to download the install script.

Why do my NVIDIA CUDA drivers keep failing?

The reason for this spontaneous nvidia cuda driver failures is ubuntu's automated security updates. When there is an update that rebuilds kernel, it will break cuda drivers and nvidia-smi will not communicate with the driver. A simple solution would be to disable automated security updates:

How do I install an NVIDIA GPU on an AWS instance?

An instance with an attached NVIDIA GPU, such as a P3 or G4dn instance, must have the appropriate NVIDIA driver installed. Depending on the instance type, you can either download a public NVIDIA driver, download a driver from Amazon S3 that is available only to AWS customers, or use an AMI with the driver pre-installed.


2 Answers

The error was caused by missing drm module required by NVIDIA driver. By default, Ubuntu AMI installs minimal generic Linux kernel(linux-image-virtual), which doesn't include drm module. To fix it, install the complete generic kernel linux-image-generic. Installing linux-image-extra-virtual would work as it is merely a transitional package to linux-image-generic. I would suggest install linux-generic to include both headers and image. To summarize:

sudo apt-get install linux-generic

There is similar question asked on AWS forum

like image 104
linleno Avatar answered Oct 13 '22 23:10

linleno


Actually right after the fresh launch of the GPU instance, apt-get upgrade wanted to keep back 4 packages as linux-virtual, linux-image-virtual. I still installed them so that I got strictly nothing more to upgrade. (The fresh setup doesn't have previous nvidia or any nouveau drivers.)

The thing is that linux-image-virtual is a lean build with no drm.ko. Just do

sudo apt-get install linux-image-extra-virtual

which contains drm.ko.

Then go on installing CUDA with either the .deb or .run file.

like image 29
Jen Avatar answered Oct 13 '22 23:10

Jen