Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Keras LSTM on CPU three times faster than GPU?

I use this notebook from Kaggle to run LSTM neural network.

I had started training of neural network and I saw that it is too slow. It is almost three times slower than CPU training.

  • CPU perfomance: 8 min per epoch;
  • GPU perfomance: 26 min per epoch.

After this I decided to find answer in this question on Stackoverflow and I applied a CuDNNLSTM (which runs only on GPU) instead of LSTM.

Hence, GPU perfomance became only 1 min per epoch and accuracy of model decreased on 3%.

Questions:

1) Does somebody know why GPU works slower than CPU in the classic LSTM layer? I do not understand why this happens.

2) Why when I use CuDNNLSTM instead of LSTM, training become much more faster and the accuracy of the model decrease?

P.S.:

My CPU: Intel Core i7-7700 Processor (8M Cache, up to 4.20 GHz)

My GPU: nVidia GeForce GTX 1050 Ti (4 GB)

like image 696
lemon Avatar asked Sep 24 '18 13:09

lemon


People also ask

Is LSTM faster on GPU?

GPUs are the de-facto standard for LSTM usage and deliver a 6x speedup during training and 140x higher throughput during inference when compared to CPU implementations. cuDNN is a GPU-accelerated deep neural network library that supports training of LSTM recurrent neural networks for sequence learning.

Is LSTM fast?

However, the LSTM training method, such as backward propagation through time (BPTT), is really slow. In this paper, by separating the LSTM cell into forward and recurrent substructures, we propose a much simpler and faster training method than the BPTT.

Why is LSTM slow?

This is mainly due to the sequential computation in the LSTM layer. Remember that LSTM requires sequential input to calculate the hidden layer weights iteratively, in other words, you must wait for the hidden state at time t-1 to calculate the hidden state at time t.

What is CuDNNLSTM?

According to the Keras documentation, a CuDNNLSTM is a: Fast LSTM implementation backed by CuDNN. Can only be run on GPU, with the TensorFlow backend. It is my belief that Keras automatically uses the GPU wherever possible.


1 Answers

I had a similar problem today and found two things that may be helpful to others (this is a regression problem on a data set with ~2.1MM rows, running on a machine with 4 P100 GPUs):

  1. Using the CuDNNLSTM layer instead of the LSTM layer on a GPU machine reduced the fit time from ~13500 seconds to ~400 seconds per epoch.
  2. Increasing the batch size (~500 to ~4700) reduced it to ~130 seconds per epoch.

Reducing the batch size has increase loss and val loss, so you'll need to make a decision about the trade offs you want to make.

like image 75
ericbdevil Avatar answered Sep 23 '22 23:09

ericbdevil