Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you accelerate torch DL training on anything other than "cuda" like "hip" or "OpenCL"?

I've noticed that torch.device can accept a range of arguments, precisely cpu, cuda, mkldnn, opengl, opencl, ideep, hip, msnpu.

However, when training deep learning models, I've only ever seen cuda or cpu being used. Very often the code looks something like this

if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

I've never seen any of the others being used, and was wondering if they can be used and how. The latest MacBooks with an AMD graphic card I believe should be able to use "hip", but is that true? And will the training speed be similar to that of using one CUDA GPU? If not, what is the point in torch.device accepting so many options if they cannot actually be used?

like image 516
andrea Avatar asked Oct 25 '20 12:10

andrea


People also ask

Can I run PyTorch without CUDA?

No CUDA. To install PyTorch via pip, and do not have a CUDA-capable or ROCm-capable system or do not require CUDA/ROCm (i.e. GPU support), in the above selector, choose OS: Linux, Package: Pip, Language: Python and Compute Platform: CPU. Then, run the command that is presented to you.

Does PyTorch supports GPU acceleration?

PyTorch v1. 12 introduces GPU-accelerated training on Apple silicon. It comes as a collaborative effort between PyTorch and the Metal engineering team at Apple. It uses Apple's Metal Performance Shaders (MPS) as the backend for PyTorch operations.

Can we run PyTorch on GPU?

PyTorch provides a simple to use API to transfer the tensor generated on CPU to GPU. Luckily the new tensors are generated on the same device as the parent tensor. The same logic applies to the model. Thus data and the model need to be transferred to the GPU.

Can you use PyTorch on AMD GPU?

Single-Node Server Requirements Before you can run an AMD machine learning framework container, your Docker environment must support AMD GPUs. Note: The AMD PyTorch framework container assumes that the server contains the required x86-64 CPU(s) and at least one of the listed AMD GPUs.


1 Answers

If you want to use a GPU for deep learning there is selection between CUDA and CUDA...

More broad answer, yes there is AMD's hip and some OpenCL implementation:

  1. The is hip by AMD - CUDA like interface with ports of pytorch, hipCaffe, tensorflow, but
    • AMD's hip/rocm is supported only on Linux - no Windows or Mac OS support by rocm provided
    • Even if you want to use Linux with AMD GPU + ROCM, you have to stick to GCN desrete devices (i.e. cards like rx 580, Vega 56/64 or Radeon VII), there is no hip/rocm support for RDNA devices (a year since a release) and it does not look to be any time soon, APUs aren't supported as well by hip.
  2. Only one popular frameworks that supports OpenCL are Caffe and Keras+PlaidML. But
    • Caffe's issues:
      • Caffe seems have not being actively developed any more and somewhat outdated by todays standard
      • Performance of Caffe OpenCL implementation is about 1/2 of what is provided by nVidia's cuDNN and AMD's MIOpen, but it works quite OK and I used it in many cases.
      • Latest version had even grater performance hit https://github.com/BVLC/caffe/issues/6585 but at least you can run a version that works several changes behind
      • Also Caffe/OpenCL works there are still some bugs I fixed manually for OpenCL over AMD. https://github.com/BVLC/caffe/issues/6239
    • Keras/Plaid-ML
      • Keras on its own is much weaker framework in terms of ability to access lower level functionality
      • PlaidML performance is still 1/2 - to 1/3 of optimized NVidia's cuDNN & AMD's MIOpen-ROCM - and slower that caffe OpenCL in the tests I did
      • The future of non-TF backends for keras is not clear since 2.4 it requires TF...

Bottom line:

  1. If you have GCN discrete AMD GPU and you run Linux you can use ROCM+Hip. Yet it isn't as stable as CUDA
  2. You can try OpenCL Caffe or Keras-PlaidML - it maybe slower and mot as optimal as other solutions but have higher chances of making it work.

Edit 2021-09-14: there is a new project dlprimitives:

https://github.com/artyom-beilis/dlprimitives

that has better performance than both Caffe-OpenCL and Keras - it is ~75% performance for training in comparison to Keras/TF2, however it is under early development and has at this point much more limited set of layers that Caffe/Keras-PlaidML

The connection to pytorch is work in progress with some initial results: https://github.com/artyom-beilis/pytorch_dlprim

Disclaimer: I'm the author of this project

like image 135
Artyom Avatar answered Sep 27 '22 18:09

Artyom