Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Keras, How can I load weights generated from CuDNNLSTM into LSTM Model?

I've developed a NN Model with Keras, based on the LSTM Layer. In order to increase speed on Paperspace (a GPU Cloud processing infrastructure), I've switched the LSTM Layer with the new CuDNNLSTM Layer. However this is usable only on machines with GPU cuDNN support. PS: CuDNNLSTM is available only on Keras master, not in the latest release.

So I've generated the weights and saved them to hdf5 format on the Cloud, and I'd like to use them locally on my MacBook. Since CuDNNLSTM layer is not available, only for my local installation I've switched back to LSTM.

Reading this tweet about CuDNN from @fchollet I thought it would work just fine, simply reading the weights back into the LSTM model.

However, when I try to import them Keras is throwing this error:

Traceback (most recent call last):
{...}
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 2048 and 4096 for 'Assign_2' (op: 'Assign') with input shapes: [2048], [4096].
{...}
ValueError: Dimension 0 in both shapes must be equal, but are 2048 and 4096 for 'Assign_2' (op: 'Assign') with input shapes: [2048], [4096]

Analyzing the hdf5 files with h5cat I can see that the two structures are different.

TL;DR

I cannot load weights generated from CuDNNLSTM into a LSTM model. Am i doing something in the wrong way? How can I get them to work seamlessly?

Here is my model:

SelectedLSTM = CuDNNLSTM if is_gpu_enabled() else LSTM
# ...
model = Sequential()
model.add(SelectedLSTM(HIDDEN_DIM, return_sequences=True, input_shape=(SEQ_LENGTH, vocab_size)))
model.add(Dropout(0.2))
model.add(SelectedLSTM(HIDDEN_DIM, return_sequences=False))
model.add(Dense(vocab_size))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
like image 266
leonardfactory Avatar asked Oct 29 '17 17:10

leonardfactory


2 Answers

The reason is that the CuDNNLSTM layer has a bias twice as large as that of LSTM. It's because of the underlying implementation of cuDNN API. You can compare the following equations (copied from cuDNN user's guide) to the usual LSTM equations:

cuDNN LSTM equations

CuDNN uses two bias terms, so the number of bias weights is doubled. To convert it back to what LSTM uses, the two bias terms need to be summed.

I've submitted a PR to do the conversion and it's merged. You can install the latest Keras from GitHub and the problem in weight loading should be solved.

like image 89
Yu-Yang Avatar answered Sep 21 '22 13:09

Yu-Yang


Just to add to @Yu-Yang's answer above, the latest Keras will automatically convert the CuDMMLSTM weights to LSTM, but it won't change your .json model architecture for you.

To run inference on LSTM, you'll need to open the JSON file, and manually change all instanced of CuDNNLSTM to LSTM. Then run model_from_json to load your model, and load_weights to load your weights.

I'd tried running load_weights without manually changing the CuDNNLSTM model at first, and got a bunch of errors.

like image 42
Derek Pankaew Avatar answered Sep 20 '22 13:09

Derek Pankaew