I'm trying to run Pytorch on a laptop that I have. It's an older model but it does have an Nvidia graphics card. I realize it is probably not going to be sufficient for real machine learning but I am trying to do it so I can learn the process of getting CUDA installed.
I have followed the steps on the installation guide for Ubuntu 18.04 (my specific distribution is Xubuntu).
My graphics card is a GeForce 845M, verified by lspci | grep nvidia
:
01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce 845M] (rev a2) 01:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)
I also have gcc 7.5 installed, verified by gcc --version
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
And I have the correct headers installed, verified by trying to install them with sudo apt-get install linux-headers-$(uname -r)
:
Reading package lists... Done Building dependency tree Reading state information... Done linux-headers-4.15.0-106-generic is already the newest version (4.15.0-106.107).
I then followed the installation instructions using a local .deb for version 10.1.
Npw, when I run nvidia-smi
, I get:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce 845M On | 00000000:01:00.0 Off | N/A | | N/A 40C P0 N/A / N/A | 88MiB / 2004MiB | 1% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 982 G /usr/lib/xorg/Xorg 87MiB | +-----------------------------------------------------------------------------+
and I run nvcc -V
I get:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Sun_Jul_28_19:07:16_PDT_2019 Cuda compilation tools, release 10.1, V10.1.243
I then performed the post-installation instructions from section 6.1, and so as a result, echo $PATH
looks like this:
/home/isaek/anaconda3/envs/stylegan2_pytorch/bin:/home/isaek/anaconda3/bin:/home/isaek/anaconda3/condabin:/usr/local/cuda-10.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
echo $LD_LIBRARY_PATH
looks like this:
/usr/local/cuda-10.1/lib64
and my /etc/udev/rules.d/40-vm-hotadd.rules
file looks like this:
# On Hyper-V and Xen Virtual Machines we want to add memory and cpus as soon as they appear ATTR{[dmi/id]sys_vendor}=="Microsoft Corporation", ATTR{[dmi/id]product_name}=="Virtual Machine", GOTO="vm_hotadd_apply" ATTR{[dmi/id]sys_vendor}=="Xen", GOTO="vm_hotadd_apply" GOTO="vm_hotadd_end" LABEL="vm_hotadd_apply" # Memory hotadd request # CPU hotadd request SUBSYSTEM=="cpu", ACTION=="add", DEVPATH=="/devices/system/cpu/cpu[0-9]*", TEST=="online", ATTR{online}="1" LABEL="vm_hotadd_end"
After all of this, I even compiled and ran the samples. ./deviceQuery
returns:
./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce 845M" CUDA Driver Version / Runtime Version 10.1 / 10.1 CUDA Capability Major/Minor version number: 5.0 Total amount of global memory: 2004 MBytes (2101870592 bytes) ( 4) Multiprocessors, (128) CUDA Cores/MP: 512 CUDA Cores GPU Max Clock rate: 863 MHz (0.86 GHz) Memory Clock rate: 1001 Mhz Memory Bus Width: 64-bit L2 Cache Size: 1048576 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: No Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1 Result = PASS
and ./bandwidthTest
returns:
[CUDA Bandwidth Test] - Starting... Running on... Device 0: GeForce 845M Quick Mode Host to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(GB/s) 32000000 11.7 Device to Host Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(GB/s) 32000000 11.8 Device to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(GB/s) 32000000 14.5 Result = PASS NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
But after all of this, this Python snippet (in a conda environment with all dependencies installed):
import torch torch.cuda.is_available()
returns False
Does anybody have any idea about how to resolve this? I've tried to add /usr/local/cuda-10.1/bin
to etc/environment
like this:
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games" PATH=$PATH:/usr/local/cuda-10.1/bin
And restarting the terminal, but that didn't fix it. I really don't know what else to try.
Collecting environment information... PyTorch version: 1.5.0 Is debug build: No CUDA used to build PyTorch: 10.2 OS: Ubuntu 18.04.4 LTS GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 CMake version: Could not collect Python version: 3.6 Is CUDA available: No CUDA runtime version: 10.1.243 GPU models and configuration: GPU 0: GeForce 845M Nvidia driver version: 418.87.00 cuDNN version: Could not collect Versions of relevant libraries: [pip] numpy==1.18.5 [pip] pytorch-ranger==0.1.1 [pip] stylegan2-pytorch==0.12.0 [pip] torch==1.5.0 [pip] torch-optimizer==0.0.1a12 [pip] torchvision==0.6.0 [pip] vector-quantize-pytorch==0.0.2 [conda] numpy 1.18.5 pypi_0 pypi [conda] pytorch-ranger 0.1.1 pypi_0 pypi [conda] stylegan2-pytorch 0.12.0 pypi_0 pypi [conda] torch 1.5.0 pypi_0 pypi [conda] torch-optimizer 0.0.1a12 pypi_0 pypi [conda] torchvision 0.6.0 pypi_0 pypi [conda] vector-quantize-pytorch 0.0.2 pypi_0 pypi
You installed the CPU only version of PyTorch. In this case PyTorch wasn't compiled with CUDA support so it didn't support CUDA. You installed the CUDA 10.2 version of PyTorch. In this case the problem is that your graphics card currently uses the 418.87 drivers, which only support up to CUDA 10.1.
Enable CUDA optimization by going to the system menu, and select Edit > Preferences. Click on the Editing tab and then select the "Enable NVIDIA CUDA /ATI Stream technology to speed up video effect preview/render" check box within the GPU acceleration area. Click on the OK button to save your changes.
PyTorch doesn't use the system's CUDA library. When you install PyTorch using the precompiled binaries using either pip
or conda
it is shipped with a copy of the specified version of the CUDA library which is installed locally. In fact, you don't even need to install CUDA on your system to use PyTorch with CUDA support.
There are two scenarios which could have caused your issue.
You installed the CPU only version of PyTorch. In this case PyTorch wasn't compiled with CUDA support so it didn't support CUDA.
You installed the CUDA 10.2 version of PyTorch. In this case the problem is that your graphics card currently uses the 418.87 drivers, which only support up to CUDA 10.1. The two potential fixes in this case would be to either install updated drivers (version >= 440.33 according to Table 2) or to install a version of PyTorch compiled against CUDA 10.1.
To determine the appropriate command to use when installing PyTorch you can use the handy widget in the "Install PyTorch" section at pytorch.org. Just select the appropriate operating system, package manager, and CUDA version then run the recommended command.
In your case one solution was to use
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
which explicitly specifies to conda that you want to install the version of PyTorch compiled against CUDA 10.1.
For more information about PyTorch CUDA compatibility with respect drivers and hardware see this answer.
Edit After you added the output of collect_env
we can see that the problem was that you had the CUDA 10.2 version of PyTorch installed. Based on that an alternative solution would have been to update the graphics driver as elaborated in item 2 and the linked answer.
First install NVIDIA CUDA Toolkit provided by Canonical:
sudo apt install -y nvidia-cuda-toolkit
or follow NVIDIA developers instructions:
# ENVARS ADDED **ONLY FOR READABILITY** NVIDIA_CUDA_PPA=https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ NVIDIA_CUDA_PREFERENCES=https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin NVIDIA_CUDA_PUBKEY=https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub # Add NVIDIA Developers 3rd-Party PPA sudo wget ${NVIDIA_CUDA_PREFERENCES} -O /etc/apt/preferences.d/nvidia-cuda sudo apt-key adv --fetch-keys ${NVIDIA_CUDA_PUBKEY} echo "deb ${NVIDIA_CUDA_PPA} /" | sudo tee /etc/apt/sources.list.d/nvidia-cuda.list # Install development tools sudo apt update sudo apt install -y cuda
then reboot the OS load the kernel with the NVIDIA drivers
Create an environment using your favorite manager (conda
, venv
, etc)
conda create -n stack-overflow pytorch torchvision conda activate stack-overflow
or reinstall pytorch
and torchvision
into the existing one:
conda activate stack-overflow conda install --force-reinstall pytorch torchvision
otherwise NVIDIA CUDA C/C++ bindings may not be correctly detected.
Finally ensure CUDA is correctly detected:
(stack-overflow)$ python3 -c 'import torch; print(torch.cuda.is_available())' True
20.04.x
22.04
(prior official release)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With