Pytorch Installation for different CUDA architectures

Question

I have a Dockerfile which installs PyTorch library from the source code.

Here is the snippet from Dockerfile which performs the installation from source code of pytorch

RUN cd /tmp/ \
 && git clone https://github.com/pytorch/pytorch.git \
 && cd pytorch  \
 && git submodule sync && git submodule update --init --recursive \
 && sudo TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0 7.5 8.0" python3 setup.py install

I don't have proper understanding of what's happening here and would appreciate some input from the community:

Why does PyTorch need different way of installation for different CUDA versions?
What is the role of TORCH_CUDA_ARCH_LIST in this context?
If my machine has multiple CUDA setups, does that mean I will have multiple PyTorch versions (specific to each CUDA setup) installed in my Docker container?
If my machine has none of the mentioned CUDA setups ("6.0 6.1 7.0 7.5 8.0"), will the PyTorch installation fail?

dagelf · Accepted Answer

TL;DR The version you choose needs to correlate with your hardware, otherwise the code won't run, even if it compiles. So for example, if you want it to run on an RTX 3090, you need to make sure sm_80, sm_86 or sm_87 is in the list. sm_87 can do things that sm_80 might not be able to do, and it might do things faster that the others can do.

Why does PyTorch need different way of installation for different CUDA versions?

New hardware is being made all the time, and the compilers and drivers that support the new architectures are often not backwards compatible, and (not sure about the case of CUDA, but definitely in the case of AMD) not even forwards compatible - so having a compiler that has known support for specific hardware, is important.

What is the role of TORCH_CUDA_ARCH_LIST in this context?

I'm guessing here, but I think that Pytorch will compile libraries for each of these architectures, and can then pick optimized functions at runtime if these architectures are present in hardware.

If my machine has multiple CUDA setups, does that mean I will have multiple PyTorch versions (specific to each CUDA setup) installed in my Docker container?

I'm guessing again, but I think they will all be in the same container as multiple libraries containing different optimizations for different hardware.

If my machine has none of the mentioned CUDA setups ("6.0 6.1 7.0 7.5 8.0"), will the PyTorch installation fail?

IIRC even if you can coax the installation into working, code execution might fail for a number of reasons, usually because of hardware incompatibility.

You can refer to the Nvidia compiler documentation at https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list to help you pick the right versions of CUDA for your intended hardware, eg. here are the hardware versions:

nvcc tag	TORCH_CUDA_ARCH_LIST	GPU Arch	Year	eg. GPU
sm_50, sm_52 and sm_53	5.0 5.1 5.3	Maxwell support	2014	GTX 9xx
sm_60, sm_61, and sm_62	6.0 6.1 6.2	Pascal support	2016	10xx, Pxxx
sm_70 and sm_72	7.0 7.2	Volta support	2017	Titan V
sm_75	7.5	Turing support	2018	most 20xx
sm_80, sm_86 and sm_87	8.0 8.6 8.7	Ampere support	2020	RTX 30xx, Axx[xx]
sm_89	8.9	Ada support	2022	RTX xxxx 40xx L4xx
sm_90, sm_90a	9.0 9.0a	Hopper support	2022	H100

Surprisingly, I could not find a list and had to compile this myself.

From the above you can garner that sm_50 is 5.0 and so on...

How do you know which nvcc tags to use?

$ locate nvcc
...
$ /usr/local/cuda-11.7/bin/nvcc --help|grep arch
...
--list-gpu-arch                                 (-arch-ls)                      
        List the virtual device architectures (compute_XX) supported by the compiler
        and exit. If both --list-gpu-code and --list-gpu-arch are set, the list is
...
$ /usr/local/cuda-11.7/bin/nvcc --list-gpu-arch
compute_35
compute_37
compute_50
compute_52
compute_53
compute_60
compute_61
compute_62
compute_70
compute_72
compute_75
compute_80
compute_86
compute_87

Again, here you can see that CUDA 11.7 supports Nvidia GPU's from the Tesla series which is not even listed on current documentation anymore. Of course those microarchitectures do not support all the functions exposed by Pytorch, so a lot of things won't run on it - and in most cases the compiler should warn you about that if you try to compile it for those versions, but the reality is that not everything is tested by the Nvidia developers, especially if you tread off the beaten track - still way more tame than the AMD world where Open Source third party drivers are ahead of vendor drivers in many respects.

Because of the increasing complexity of hardware and compilers, the future looks less and less like vendor compilers like CUDA and ROCm, and more and more like OpenCL, and cross fingers Mojo, so that you don't have to worry about the magic numbers that make each version perform optimally.

Pytorch Installation for different CUDA architectures

Tags:

docker

pytorch

outlier229

1 Answers

dagelf

Recent Activity

Donate For Us

Pytorch Installation for different CUDA architectures

Tags:

docker

pytorch

outlier229

1 Answers

dagelf

Related questions

Recent Activity

Donate For Us