To install CUDA 6.5 on Ubuntu 14.04.1 LTS on AWS EC2 g2.2xlarge instance, whether I install via the .deb file or .run file
.sudo ./cuda_6.5.14_linux_64.run --kernel-source-path=/usr/src/linux-headers-3.13.0-34-generic
I always get the same error about a missing drm.ko. The code compilation seems successful. Below was the log. (I rebooted before installing)
Kernel module compilation complete.
Unable to determine if Secure Boot is enabled: No such file or directory
Kernel module load error: No such file or directory
Kernel messages:
[ 3.595939] type=1400 audit(1408809902.911:5): apparmor="STATUS"
operation="profile_replace" profile="unconfined"
name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=492
comm="apparmor_parser"
[ 3.595942] type=1400 audit(1408809902.911:6): apparmor="STATUS"
operation="profile_replace" profile="unconfined"
name="/usr/lib/connman/scripts/dhclient-script" pid=492
comm="apparmor_parser"
[ 3.596140] type=1400 audit(1408809902.915:7): apparmor="STATUS"
operation="profile_replace" profile="unconfined"
operation="profile_replace" profile="unconfined"
name="/usr/lib/connman/scripts/dhclient-script" pid=492
comm="apparmor_parser"
[ 4.696067] init: failsafe main process (833) killed by TERM signal
[ 4.793261] type=1400 audit(1408809904.107:8): apparmor="STATUS"
operation="profile_replace" profile="unconfined" name="/sbin/dhclient"
pid=952 comm="apparmor_parser"
[ 4.793267] type=1400 audit(1408809904.107:9): apparmor="STATUS"
operation="profile_replace" profile="unconfined"
name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=952
comm="apparmor_parser"
[ 5.036249] init: plymouth-upstart-bridge main process ended, respawning
[ 6.589233] init: udev-fallback-graphics main process (1203) terminated
with status 1
[ 136.367014] nvidia: module license 'NVIDIA' taints kernel.
[ 136.367019] Disabling lock debugging due to kernel taint
[ 136.370281] nvidia: module verification failed: signature and/or
required key missing - tainting kernel
[ 136.370383] nvidia: Unknown symbol drm_open (err 0)
[ 136.370393] nvidia: Unknown symbol drm_poll (err 0)
[ 136.370404] nvidia: Unknown symbol drm_pci_init (err 0)
[ 136.370449] nvidia: Unknown symbol drm_gem_prime_handle_to_fd (err 0)
[ 136.370462] nvidia: Unknown symbol drm_gem_private_object_init (err 0)
[ 136.370474] nvidia: Unknown symbol drm_gem_mmap (err 0)
[ 136.370478] nvidia: Unknown symbol drm_ioctl (err 0)
[ 136.370486] nvidia: Unknown symbol drm_gem_object_free (err 0)
[ 136.370496] nvidia: Unknown symbol drm_read (err 0)
[ 136.370509] nvidia: Unknown symbol drm_gem_handle_create (err 0)
[ 136.370515] nvidia: Unknown symbol drm_prime_pages_to_sg (err 0)
[ 136.370550] nvidia: Unknown symbol drm_pci_exit (err 0)
[ 136.370563] nvidia: Unknown symbol drm_release (err 0)
[ 136.370565] nvidia: Unknown symbol drm_gem_prime_export (err 0)
The driver installation is unable to locate the kernel source. Please make sure that the kernel source packages are installed and set up correctly.
After you install an NVIDIA graphics driver on your instance, you can install a version of CUDA other than the version that is bundled with the graphics driver. The following procedure demonstrates how to configure multiple versions of CUDA on the instance. Connect to your Linux instance.
Open the NVIDIA website and select the version of CUDA that you need. Select the architecture, distribution, and version for the operating system on your instance. For Installer Type, select runfile (local) . Follow the instructions to download the install script.
The reason for this spontaneous nvidia cuda driver failures is ubuntu's automated security updates. When there is an update that rebuilds kernel, it will break cuda drivers and nvidia-smi will not communicate with the driver. A simple solution would be to disable automated security updates:
An instance with an attached NVIDIA GPU, such as a P3 or G4dn instance, must have the appropriate NVIDIA driver installed. Depending on the instance type, you can either download a public NVIDIA driver, download a driver from Amazon S3 that is available only to AWS customers, or use an AMI with the driver pre-installed.
The error was caused by missing drm module required by NVIDIA driver. By default, Ubuntu AMI installs minimal generic Linux kernel(linux-image-virtual), which doesn't include drm module. To fix it, install the complete generic kernel linux-image-generic. Installing linux-image-extra-virtual would work as it is merely a transitional package to linux-image-generic. I would suggest install linux-generic to include both headers and image. To summarize:
sudo apt-get install linux-generic
There is similar question asked on AWS forum
Actually right after the fresh launch of the GPU instance, apt-get upgrade
wanted to keep back 4 packages as linux-virtual
, linux-image-virtual
. I still installed them so that I got strictly nothing more to upgrade. (The fresh setup doesn't have previous nvidia or any nouveau drivers.)
The thing is that linux-image-virtual
is a lean build with no drm.ko
. Just do
sudo apt-get install linux-image-extra-virtual
which contains drm.ko
.
Then go on installing CUDA with either the .deb
or .run
file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With