Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AMD plaidml vs CPU Tensorflow - Unexpected results

I am currently running a simple script to train the mnist dataset.

Running the training through my CPU via Tensorflow is giving me 49us/sample and a 3e epoch using the following code:-

# CPU

import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))

model.compile(optimizer='adam', loss="sparse_categorical_crossentropy", metrics=["accuracy"])

model.fit(x_train, y_train, epochs=3)

When I run the dataset through my AMD Pro 580 using the opencl_amd_radeon_pro_580_compute_engine via plaidml setup I get the following results 249us/sample with a 15s epoch, using the following code:-

# GPU

import plaidml.keras
plaidml.keras.install_backend()
import keras
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = keras.utils.normalize(x_train, axis=1)
x_test = keras.utils.normalize(x_test, axis=1)

model = keras.models.Sequential()
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))

model.compile(optimizer='adam', loss="sparse_categorical_crossentropy", metrics=["accuracy"])

model.fit(x_train, y_train, epochs=3)

I can see my CPU firing up for the CPU test and my GPU maxing out for the GPU test, but I am very confused as to why the CPU is out performing the GPU by a factor of 5.

Should this be the expected results?

Am I doing something wrong in my code?

like image 442
Web Nexus Avatar asked Dec 10 '22 01:12

Web Nexus


2 Answers

It seems I've found the right solution at least for macOS/Keras/AMD GPU setup.

TL;DR:

  • Do not use OpenCL, use *metal instead.
  • Do not use Tensorflow 2.0, use Keras only API

Here are the details:

Run plaidml-setup and pickup metalšŸ¤˜šŸ»this is important!

...
Multiple devices detected (You can override by setting PLAIDML_DEVICE_IDS).
Please choose a default device:

   1 : llvm_cpu.0
   2 : metal_intel(r)_uhd_graphics_630.0
   3 : metal_amd_radeon_pro_560x.0

Default device? (1,2,3)[1]:3
...

Make sure you saved changes:

Save settings to /Users/alexanderegorov/.plaidml? (y,n)[y]:y
Success!

Now run MNIST example, you should see something like:

INFO:plaidml:Opening device "metal_amd_radeon_pro_560x.0"

This is it. I have made a comparison using plaidbench keras mobilenet:

metal_amd_radeon_pro_560x.0 FASTEST!

  • Example finished, elapsed: 0.435s (compile), 8.057s (execution)

opencl_amd_amd_radeon_pro_560x_compute_engine.0

  • Example finished, elapsed: 3.197s (compile), 14.620s (execution)

llvm_cpu.0

  • Example finished, elapsed: 3.619s (compile), 47.837s (execution)
like image 79
sashaegorov Avatar answered Jan 07 '23 21:01

sashaegorov


I think there are two aspects of the observed situation:

  1. plaidml is not that great in my experience and Iā€™ve had similar results, sadly.
  2. Moving data to the gpu is slow. In this case, the MNIST data is really small and the time to move the data there outweighs the ā€œbenefitā€ of paralleling the computation. Actually the TF CPU probably does parallel matrix multiplication as well but itā€™s much faster as the data is smal and closer to the processing unit.
like image 24
stan0 Avatar answered Jan 07 '23 21:01

stan0