I have a Dockerfile which installs PyTorch library from the source code.
Here is the snippet from Dockerfile which performs the installation from source code of pytorch
RUN cd /tmp/ \
&& git clone https://github.com/pytorch/pytorch.git \
&& cd pytorch \
&& git submodule sync && git submodule update --init --recursive \
&& sudo TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0 7.5 8.0" python3 setup.py install
I don't have proper understanding of what's happening here and would appreciate some input from the community:
TORCH_CUDA_ARCH_LIST in this context?TL;DR The version you choose needs to correlate with your hardware, otherwise the code won't run, even if it compiles. So for example, if you want it to run on an RTX 3090, you need to make sure sm_80, sm_86 or sm_87 is in the list. sm_87 can do things that sm_80 might not be able to do, and it might do things faster that the others can do.
Why does PyTorch need different way of installation for different CUDA versions?
New hardware is being made all the time, and the compilers and drivers that support the new architectures are often not backwards compatible, and (not sure about the case of CUDA, but definitely in the case of AMD) not even forwards compatible - so having a compiler that has known support for specific hardware, is important.
What is the role of TORCH_CUDA_ARCH_LIST in this context?
I'm guessing here, but I think that Pytorch will compile libraries for each of these architectures, and can then pick optimized functions at runtime if these architectures are present in hardware.
If my machine has multiple CUDA setups, does that mean I will have multiple PyTorch versions (specific to each CUDA setup) installed in my Docker container?
I'm guessing again, but I think they will all be in the same container as multiple libraries containing different optimizations for different hardware.
If my machine has none of the mentioned CUDA setups ("6.0 6.1 7.0 7.5 8.0"), will the PyTorch installation fail?
IIRC even if you can coax the installation into working, code execution might fail for a number of reasons, usually because of hardware incompatibility.
You can refer to the Nvidia compiler documentation at https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list to help you pick the right versions of CUDA for your intended hardware, eg. here are the hardware versions:
| nvcc tag | TORCH_CUDA_ARCH_LIST | GPU Arch | Year | eg. GPU |
|---|---|---|---|---|
| sm_50, sm_52 and sm_53 | 5.0 5.1 5.3 | Maxwell support | 2014 | GTX 9xx |
| sm_60, sm_61, and sm_62 | 6.0 6.1 6.2 | Pascal support | 2016 | 10xx, Pxxx |
| sm_70 and sm_72 | 7.0 7.2 | Volta support | 2017 | Titan V |
| sm_75 | 7.5 | Turing support | 2018 | most 20xx |
| sm_80, sm_86 and sm_87 | 8.0 8.6 8.7 | Ampere support | 2020 | RTX 30xx, Axx[xx] |
| sm_89 | 8.9 | Ada support | 2022 | RTX xxxx 40xx L4xx |
| sm_90, sm_90a | 9.0 9.0a | Hopper support | 2022 | H100 |
Surprisingly, I could not find a list and had to compile this myself.
From the above you can garner that sm_50 is 5.0 and so on...
How do you know which nvcc tags to use?
$ locate nvcc
...
$ /usr/local/cuda-11.7/bin/nvcc --help|grep arch
...
--list-gpu-arch (-arch-ls)
List the virtual device architectures (compute_XX) supported by the compiler
and exit. If both --list-gpu-code and --list-gpu-arch are set, the list is
...
$ /usr/local/cuda-11.7/bin/nvcc --list-gpu-arch
compute_35
compute_37
compute_50
compute_52
compute_53
compute_60
compute_61
compute_62
compute_70
compute_72
compute_75
compute_80
compute_86
compute_87
Again, here you can see that CUDA 11.7 supports Nvidia GPU's from the Tesla series which is not even listed on current documentation anymore. Of course those microarchitectures do not support all the functions exposed by Pytorch, so a lot of things won't run on it - and in most cases the compiler should warn you about that if you try to compile it for those versions, but the reality is that not everything is tested by the Nvidia developers, especially if you tread off the beaten track - still way more tame than the AMD world where Open Source third party drivers are ahead of vendor drivers in many respects.
Because of the increasing complexity of hardware and compilers, the future looks less and less like vendor compilers like CUDA and ROCm, and more and more like OpenCL, and cross fingers Mojo, so that you don't have to worry about the magic numbers that make each version perform optimally.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With