Can you accelerate torch DL training on anything other than "cuda" like "hip" or "OpenCL"?

Tags:

I've noticed that torch.device can accept a range of arguments, precisely cpu, cuda, mkldnn, opengl, opencl, ideep, hip, msnpu.

However, when training deep learning models, I've only ever seen cuda or cpu being used. Very often the code looks something like this

if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

I've never seen any of the others being used, and was wondering if they can be used and how. The latest MacBooks with an AMD graphic card I believe should be able to use "hip", but is that true? And will the training speed be similar to that of using one CUDA GPU? If not, what is the point in torch.device accepting so many options if they cannot actually be used?

516

asked Oct 25 '20 12:10

andrea

1 Answers

If you want to use a GPU for deep learning there is selection between CUDA and CUDA...

More broad answer, yes there is AMD's hip and some OpenCL implementation:

The is hip by AMD - CUDA like interface with ports of pytorch, hipCaffe, tensorflow, but
- AMD's hip/rocm is supported only on Linux - no Windows or Mac OS support by rocm provided
- Even if you want to use Linux with AMD GPU + ROCM, you have to stick to GCN desrete devices (i.e. cards like rx 580, Vega 56/64 or Radeon VII), there is no hip/rocm support for RDNA devices (a year since a release) and it does not look to be any time soon, APUs aren't supported as well by hip.
Only one popular frameworks that supports OpenCL are Caffe and Keras+PlaidML. But
- Caffe's issues:
  - Caffe seems have not being actively developed any more and somewhat outdated by todays standard
  - Performance of Caffe OpenCL implementation is about 1/2 of what is provided by nVidia's cuDNN and AMD's MIOpen, but it works quite OK and I used it in many cases.
  - Latest version had even grater performance hit https://github.com/BVLC/caffe/issues/6585 but at least you can run a version that works several changes behind
  - Also Caffe/OpenCL works there are still some bugs I fixed manually for OpenCL over AMD. https://github.com/BVLC/caffe/issues/6239
- Keras/Plaid-ML
  - Keras on its own is much weaker framework in terms of ability to access lower level functionality
  - PlaidML performance is still 1/2 - to 1/3 of optimized NVidia's cuDNN & AMD's MIOpen-ROCM - and slower that caffe OpenCL in the tests I did
  - The future of non-TF backends for keras is not clear since 2.4 it requires TF...

Bottom line:

If you have GCN discrete AMD GPU and you run Linux you can use ROCM+Hip. Yet it isn't as stable as CUDA
You can try OpenCL Caffe or Keras-PlaidML - it maybe slower and mot as optimal as other solutions but have higher chances of making it work.

Edit 2021-09-14: there is a new project dlprimitives:

https://github.com/artyom-beilis/dlprimitives

that has better performance than both Caffe-OpenCL and Keras - it is ~75% performance for training in comparison to Keras/TF2, however it is under early development and has at this point much more limited set of layers that Caffe/Keras-PlaidML

The connection to pytorch is work in progress with some initial results: https://github.com/artyom-beilis/pytorch_dlprim

Disclaimer: I'm the author of this project

135

answered Sep 27 '22 18:09

Artyom

Related questions
                            
                                Training GAN on small dataset of images
                            
                                Concatening an attention layer with decoder input seq2seq model on Keras
                            
                                Geting ERROR: Config value cuda is not defined in any .rc file when trying to train mobilenet in tensorflow
                            
                                ReduceLROnPlateau gives error with ADAM optimizer
                            
                                Tensorflow adam optimizer in Keras
                            
                                What is the equivalent of torch.nn.functional.grid_sample in Tensorflow / Numpy?
                            
                                purpose of 'num_samples' in Tune of Ray package for hyperprameter optimization
                            
                                False positives in faster-rcnn object detection
                            
                                Does keras have a pretrained AlexNet like VGG19?
                            
                                Validation Loss Much Higher Than Training Loss
                            
                                ImageDataGenerator: how to add the 4th dimension to a numpy array?
                            
                                What is the best way to handle large data with Tensorflow.js and tf.Tensor?
                            
                                How to determine input shape in keras?
                            
                                what is output dimension of the inception and vgg16
                            
                                Android Studio: Connect to 127.0.0.1:8118 [/127.0.0.1] failed: Connection refused: connect
                            
                                Keras: fix "IndexError: list index out of range" error when using model.fit
                            
                                on_epoch_end() not called in keras fit_generator()
                            
                                How to get probability of prediction per entity from Spacy NER model?
                            
                                Converting the annotations to COCO format from Mask-RCNN dataset format
                            
                                Difference between tf.clip_by_value and tf.clip_by_global_norm for RNN's and how to decide max value to clip on?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can you accelerate torch DL training on anything other than "cuda" like "hip" or "OpenCL"?

Tags:

deep-learning

gpu

pytorch

opencl

andrea

People also ask

1 Answers

Artyom

Recent Activity

Donate For Us