Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras (Tensorflow backend) slower on GPU than on CPU when training certain networks

I am having some difficulty understanding exactly why the GPU and CPU speeds are similar with networks of small size (CPU is sometimes faster), and GPU is faster with networks of larger size. The code at the bottom of the question runs in 103.7s on an i7-6700k, but when using tensorflow-gpu, the code runs in 29.5 seconds.

However, when I train a network that has 100 hidden neurons, instead of 1000 like in the example below, I get ~20 seconds when using the GPU, and ~15 seconds when using the CPU.

I read on another stack overflow answer that CPU->GPU transfers take long, I'm assuming this is in reference to loading the data examples on the GPU.

Can someone explain why this occurs, and possibly reference some change in the code that I can make to maximize speed?

import numpy as np
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.utils import np_utils
from keras.layers.core import Dense, Activation, Flatten, Dropout
from sklearn.preprocessing import normalize

## Importing the MNIST dataset using Keras
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# reshape for vector input
N, x, y = X_train.shape
X_train = normalize(np.reshape(X_train, (N, x * y)))

N, x, y = X_test.shape
X_test = normalize(np.reshape(X_test, (N, x * y)))

# one-hot encoding
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

model = Sequential()
model.add(Dense(output_dim=750, input_dim=784))
model.add(Activation('relu'))
model.add(Dropout(0.2))

model.add(Dense(150))
model.add(Activation('relu'))
model.add(Dropout(0.2))

model.add(Dense(50))
model.add(Activation('relu'))
model.add(Dropout(0.2))

model.add(Dense(50))
model.add(Activation('relu'))
model.add(Dropout(0.2))

model.add(Dense(10))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='Nadam', metrics=['accuracy'])

fit = model.fit(X_train, y_train, batch_size=128, nb_epoch=10, verbose=0)

## Printing the accuracy of our model, according to the loss function specified in model.compile above
score = model.evaluate(X_test, y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])
like image 382
Enrico Borba Avatar asked Feb 07 '17 18:02

Enrico Borba


People also ask

Does Keras run seamlessly on GPU and CPU?

Via TensorFlow (or Theano, or CNTK), Keras is able to run seamlessly on both CPUs and GPUs. When running on CPU, TensorFlow is itself wrapping a low-level library for tensor operations, called Eigen.

Is Keras faster with GPU?

fit() runs faster on GPU when the CPU is loaded with a heavy multiprocessing script.

How much faster is GPU than CPU TensorFlow?

GPU vs CPU Performance in Deep Learning Models However, finding models that are both accurate and can run efficiently on CPUs can be a challenge. Generally speaking, GPUs are 3X faster than CPUs.

Does Keras use CPU or GPU?

Using Your GPU For Model Training With KerasIf a TensorFlow operation has both CPU and GPU implementations, by default the GPU will be used by default.


1 Answers

In case of tiny networks batch loading may be the culprit here.

Keras is loading each minibatch from RAM to GPU at the start of each iteration, thus creating a bottleneck in tiny networks (where forward/backward computation is very quick).
You can try using model.fit_generator instead of plain fit, so that CPU thread which loads minibatches works in parallel.

Unfortunately, there is no way I am aware of to preload the whole dataset on GPU for Keras (see my issue)

If you're using Tensorflow backend, you can use Google Timeline profiling tool to see what causes the slowdowns. For the reference, see this issue

like image 140
Alexander Serikov Avatar answered Sep 22 '22 19:09

Alexander Serikov