Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cannot train Keras convolution network on GPU

I can train a Keras network with Dense layer using keras.datasets.fashion_mnist dataset. However, when I tried to train a convolutional network, I got an error.

Here is some part of the code:

from tensorflow.keras.layers import *

model = keras.Sequential([
        Convolution2D(16, (3,3), activation='relu', input_shape=(28,28,1)),
        MaxPooling2D(pool_size=(2,2)),
        Flatten(),
        Dense(16, activation='relu'),
        Dense(10, activation='softmax')    
])
model.compile(optimizer=tf.train.AdamOptimizer(), 
          loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=5)

and its error when I tried to fit.

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node conv2d/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/TFOptimizer/gradients/conv2d/Conv2D_grad/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer, conv2d/Conv2D/ReadVariableOp)]] [[{{node loss/dense_1_loss/broadcast_weights/assert_broadcastable/AssertGuard/Assert/Switch_2/_69}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_112_l...t/Switch_2", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

I have cudnn64_7.dll in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin and the PATH is already contain that folder.

like image 369
wannik Avatar asked Nov 07 '18 17:11

wannik


People also ask

How long does it take to train keras with multiple GPUs?

Using a single GPU we were able to obtain 63 second epochs with a total training time of 74m10s. However, by using multi-GPU training with Keras and Python we decreased training time to 16 second epochs with a total training time of 19m3s.

Is it possible to use TensorFlow with multiple GPUs in keras?

TensorFlow was a possibility, but it could take a lot of boilerplate code and tweaking to get your network to train using multiple GPUs. I preferred using the mxnet backend (or even the mxnet library outright) to Keras when performing multi-GPU training, but that introduced even more configurations to handle.

Is it possible to install keras-GPU without CUDA and cuDNN?

However there is now a keras-gpu metapackage available on Anaconda which apparently doesn't require installing CUDA and cuDNN libraries beforehand (mine were already installed anyway). This is what worked for me to create a dedicated environment named keras_gpu: To add on @johncasey 's answer but for TensorFlow 2.0, adding this block works for me:

How do you train deep neural networks with multiple GPUs?

Using Keras to train deep neural networks with multiple GPUs (Photo credit: Nor-Tech.com). Keras is undoubtedly my favorite deep learning + Python framework, especially for image classification. I use Keras in production applications, in my personal deep learning projects, and here on the PyImageSearch blog.


1 Answers

I think this link would solve your problem, its because the cnDNN version you installed is not compatible with the cuDNN version that compiled in tensorflow.

like image 193
Vincent Tang Avatar answered Oct 17 '22 21:10

Vincent Tang