AMD plaidml vs CPU Tensorflow - Unexpected results

Question

I am currently running a simple script to train the mnist dataset.

Running the training through my CPU via Tensorflow is giving me 49us/sample and a 3e epoch using the following code:-

# CPU

import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))

model.compile(optimizer='adam', loss="sparse_categorical_crossentropy", metrics=["accuracy"])

model.fit(x_train, y_train, epochs=3)

When I run the dataset through my AMD Pro 580 using the opencl_amd_radeon_pro_580_compute_engine via plaidml setup I get the following results 249us/sample with a 15s epoch, using the following code:-

# GPU

import plaidml.keras
plaidml.keras.install_backend()
import keras
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = keras.utils.normalize(x_train, axis=1)
x_test = keras.utils.normalize(x_test, axis=1)

model = keras.models.Sequential()
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))

model.compile(optimizer='adam', loss="sparse_categorical_crossentropy", metrics=["accuracy"])

model.fit(x_train, y_train, epochs=3)

I can see my CPU firing up for the CPU test and my GPU maxing out for the GPU test, but I am very confused as to why the CPU is out performing the GPU by a factor of 5.

Should this be the expected results?

Am I doing something wrong in my code?

sashaegorov · Accepted Answer

It seems I've found the right solution at least for macOS/Keras/AMD GPU setup.

TL;DR:

Do not use OpenCL, use *metal instead.
Do not use Tensorflow 2.0, use Keras only API

Here are the details:

Run plaidml-setup and pickup metal🤘🏻this is important!

...
Multiple devices detected (You can override by setting PLAIDML_DEVICE_IDS).
Please choose a default device:

   1 : llvm_cpu.0
   2 : metal_intel(r)_uhd_graphics_630.0
   3 : metal_amd_radeon_pro_560x.0

Default device? (1,2,3)[1]:3
...

Make sure you saved changes:

Save settings to /Users/alexanderegorov/.plaidml? (y,n)[y]:y
Success!

Now run MNIST example, you should see something like:

INFO:plaidml:Opening device "metal_amd_radeon_pro_560x.0"

This is it. I have made a comparison using plaidbench keras mobilenet:

metal_amd_radeon_pro_560x.0 FASTEST!

Example finished, elapsed: 0.435s (compile), 8.057s (execution)

opencl_amd_amd_radeon_pro_560x_compute_engine.0

Example finished, elapsed: 3.197s (compile), 14.620s (execution)

llvm_cpu.0

Example finished, elapsed: 3.619s (compile), 47.837s (execution)

stan0 · Answer

I think there are two aspects of the observed situation:

plaidml is not that great in my experience and I’ve had similar results, sadly.
Moving data to the gpu is slow. In this case, the MNIST data is really small and the time to move the data there outweighs the “benefit” of paralleling the computation. Actually the TF CPU probably does parallel matrix multiplication as well but it’s much faster as the data is smal and closer to the processing unit.

AMD plaidml vs CPU Tensorflow - Unexpected results

Tags:

python

python-3.x

machine-learning

tensorflow

keras

Web Nexus

2 Answers

sashaegorov

stan0

Recent Activity

Donate For Us

AMD plaidml vs CPU Tensorflow - Unexpected results

Tags:

python

python-3.x

machine-learning

tensorflow

keras

Web Nexus

2 Answers

sashaegorov

stan0

Related questions

Recent Activity

Donate For Us