Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Anaconda reading wrong CUDA version

I have a conda environment with PyTorch and Tensorflow, which both require CUDA 9.0 (~cudatoolkit 9.0 from conda). After installing pytorch with torchvision and the cudatoolkit (like they provided on their website) I wanted to install Tensorflow, the problem here is that I get this error:

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: / 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                   

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - tensorflow==1.12.0 -> python[version='2.7.*|3.6.*']
  - tensorflow==1.12.0 -> python[version='>=2.7,<2.8.0a0|>=3.6,<3.7.0a0']

Your python: python=3.5

If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.

The following specifications were found to be incompatible with your system:

  - feature:/linux-64::__cuda==10.2=0
  - feature:|@/linux-64::__cuda==10.2=0

Your installed version is: 10.2

If I run nvcc or nvidia-smi on my host or the activated conda environment, I get that I have installed CUDA 10.2, even though conda list shows me that cudatoolkit 9.0 is installed. Any solution to this?

EDIT:

When running this code sample:

# setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

#Additional Info when using cuda
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_cached(0)/1024**3,1), 'GB')


print(torch.version.cuda)

I get this output:

GeForce GTX 1050
Memory Usage:
Allocated: 0.0 GB
Cached:    0.0 GB
9.0.176

So PyTorch does get the correct CUDA version, I just cant get tensorflow-gpu installed.

like image 544
filip Avatar asked Sep 11 '20 07:09

filip


Video Answer


1 Answers

If I run nvcc or nvidia-smi on my host or the activated conda environment, I get that I have installed CUDA 10.2, even though conda list shows me that cudatoolkit 9.0 is installed. Any solution to this?

cudatoolkit doesn't ship with compiler (nvcc), thus when you run nvcc you start compiler from system wide installation. That's why it prints 10.2 istead of 9.0, and pytorch sees the local cudatoolkit.

anaconda / packages / cudatoolkit :

This CUDA Toolkit includes GPU-accelerated libraries, and the CUDA runtime for the Conda ecosystem. For the full CUDA Toolkit with a compiler and development tools visit https://developer.nvidia.com/cuda-downloads

From your comment above I understood that you are using python=3.5.6. So, first of all you should search for available tensorflow py35 builds using:

conda search tensorflow | grep py35

I have the following output:

tensorflow                     1.9.0 eigen_py35h8c89287_1  pkgs/main           
tensorflow                     1.9.0 gpu_py35h42d5ad8_1  pkgs/main           
tensorflow                     1.9.0 gpu_py35h60c0932_1  pkgs/main           
tensorflow                     1.9.0 gpu_py35hb39db67_1  pkgs/main           
tensorflow                     1.9.0 mkl_py35h5be851a_1  pkgs/main           
tensorflow                    1.10.0 eigen_py35h5ed898b_0  pkgs/main           
tensorflow                    1.10.0 gpu_py35h566a776_0  pkgs/main           
tensorflow                    1.10.0 gpu_py35ha6119f3_0  pkgs/main           
tensorflow                    1.10.0 gpu_py35hd9c640d_0  pkgs/main           
tensorflow                    1.10.0 mkl_py35heddcb22_0  pkgs/main

As you can see there is no tensorflow 1.12.0 builds for py35, and that's why you are getting that error. You can try to inspect other conda channels, for example, conda-forge:

conda search tensorflow -c conda-forge | grep py35

But that wasn't helpful:

tensorflow                     0.9.0          py35_0  conda-forge         
tensorflow                    0.10.0          py35_0  conda-forge         
tensorflow                 0.11.0rc0          py35_0  conda-forge         
tensorflow                 0.11.0rc2          py35_0  conda-forge         
tensorflow                    0.11.0          py35_0  conda-forge         
tensorflow                    0.12.1          py35_0  conda-forge         
tensorflow                    0.12.1          py35_1  conda-forge         
tensorflow                    0.12.1          py35_2  conda-forge         
tensorflow                     1.0.0          py35_0  conda-forge         
tensorflow                     1.1.0          py35_0  conda-forge         
tensorflow                     1.2.0          py35_0  conda-forge         
tensorflow                     1.2.1          py35_0  conda-forge         
tensorflow                     1.3.0          py35_0  conda-forge         
tensorflow                     1.4.0          py35_0  conda-forge         
tensorflow                     1.5.0          py35_0  conda-forge         
tensorflow                     1.5.1          py35_0  conda-forge         
tensorflow                     1.6.0          py35_0  conda-forge         
tensorflow                     1.8.0          py35_0  conda-forge         
tensorflow                     1.8.0          py35_1  conda-forge         
tensorflow                     1.9.0 eigen_py35h8c89287_1  pkgs/main           
tensorflow                     1.9.0 gpu_py35h42d5ad8_1  pkgs/main           
tensorflow                     1.9.0 gpu_py35h60c0932_1  pkgs/main           
tensorflow                     1.9.0 gpu_py35hb39db67_1  pkgs/main           
tensorflow                     1.9.0 mkl_py35h5be851a_1  pkgs/main           
tensorflow                     1.9.0          py35_0  conda-forge         
tensorflow                    1.10.0 eigen_py35h5ed898b_0  pkgs/main           
tensorflow                    1.10.0 gpu_py35h566a776_0  pkgs/main           
tensorflow                    1.10.0 gpu_py35ha6119f3_0  pkgs/main           
tensorflow                    1.10.0 gpu_py35hd9c640d_0  pkgs/main           
tensorflow                    1.10.0 mkl_py35heddcb22_0  pkgs/main           
tensorflow                    1.10.0          py35_0  conda-forge

So, the possible solutions are:

  1. Install one of the older available tensorflow 1.10.0 gpu_py35 builds.
  2. Switch to python 3.6.
conda search tensorflow | grep py36

...
tensorflow                    1.11.0 gpu_py36h4459f94_0  pkgs/main           
tensorflow                    1.11.0 gpu_py36h9c9050a_0  pkgs/main           
...        
tensorflow                    1.12.0 gpu_py36he68c306_0  pkgs/main           
tensorflow                    1.12.0 gpu_py36he74679b_0  pkgs/main
...         

Note that versions >=1.13.1 doesn't support CUDA 9.

  1. Use pip install inside conda env to install missing tensorflow build, because pip hosts more build combinations: Tested build configurations

Here is some best practices from Anaconda how to use pip w/ conda: Using Pip in a Conda Environment

  1. The last option is to build your own missing conda package with conda-build
like image 199
trsvchn Avatar answered Sep 24 '22 03:09

trsvchn