How can I speed up this Keras Attention computation?

Tags:

I have written a custom keras layer for an AttentiveLSTMCell and AttentiveLSTM(RNN) in line with keras' new approach to RNNs. This attention mechanism is described by Bahdanau where, in an encoder/decoder model a "context" vector is created from all the ouputs of the encoder and the decoder's current hidden state. I then append the context vector, at every timestep, to the input.

The model is being used in to make a Dialog Agent, but is very similar to NMT models in architecture (similar tasks).

However, in adding this attention mechanism, I have slowed down the training of my network 5 fold, and I would really like to know how I could write the part of the code that is slowing it down so much in a more efficient way.

The brunt of the computation is done here:

h_tm1 = states[0]  # previous memory state
c_tm1 = states[1]  # previous carry state

# attention mechanism

# repeat the hidden state to the length of the sequence
_stm = K.repeat(h_tm1, self.annotation_timesteps)

# multiplty the weight matrix with the repeated (current) hidden state
_Wxstm = K.dot(_stm, self.kernel_w)

# calculate the attention probabilities
# self._uh is of shape (batch, timestep, self.units)
et = K.dot(activations.tanh(_Wxstm + self._uh), K.expand_dims(self.kernel_v))

at = K.exp(et)
at_sum = K.sum(at, axis=1)
at_sum_repeated = K.repeat(at_sum, self.annotation_timesteps)
at /= at_sum_repeated  # vector of size (batchsize, timesteps, 1)

# calculate the context vector
context = K.squeeze(K.batch_dot(at, self.annotations, axes=1), axis=1)

# append the context vector to the inputs
inputs = K.concatenate([inputs, context])

in the call method of the AttentiveLSTMCell (one timestep).

The full code can be found here. If it is necessary that I provide some data and ways to interact with the model, then I can do that.

Any ideas? I am, of course, training on a GPU if there is something clever here.

432

asked Mar 08 '18 14:03

modesitt

2 Answers

I would recommend training your model using relu rather than tanh, as this operation is significantly faster to compute. This will save you computation time on the order of your training examples * average sequence length per example * number of epochs.

Also, I would evaluate the performance improvement of appending the context vector, keeping in mind that this will slow your iteration cycle on other parameters. If it's not giving you much improvement, it might be worth trying other approaches.

148

answered Oct 17 '22 01:10

mr_snuffles

You modified the LSTM class which is good for CPU computation, but you mentioned that you're training on GPU.

I recommend looking into the cudnn-recurrent implementation or further into the tf part that is used. Maybe you can extend the code there.

answered Oct 17 '22 02:10

Benedikt Fuchs

Related questions
                            
                                Asynchronously redirect stdout/stdin from embedded python to c++?
                            
                                How to get centroids from SciPy's hierarchical agglomerative clustering?
                            
                                What is a real-world example of Dependency Injection in a Dynamic Language?
                            
                                Disabling Javascript after page has been rendered in Selenium Webdriver
                            
                                What is this (cid:51) in the output of pdf2txt?
                            
                                Is there any documentation of numpy numerical stability?
                            
                                PyCharm SSH tunneling via local ssh config (~/.ssh/config)
                            
                                Why is merging Python system classes with custom classes less desirable than hooking the import mechanism?
                            
                                Importing a Python package from a script with the same name
                            
                                Ordering and pagination in SQL-alchemy using non-sql ranking
                            
                                Python warnings- how to not print the source line? [duplicate]
                            
                                Prevent PyCharm from showing builtin modules on KeyboardInterrupt and other occasions
                            
                                Low InnoDB Writes per Second - AWS EC2 to MySQL RDS using Python
                            
                                How to distribute files in a Python sdist that are not VCS tracked?
                            
                                Is it possible to prioritise a lock?
                            
                                Unpredictable pandas slice assignment behavior with no SettingWithCopyWarning
                            
                                Executable made with pyInstaller/UPX experiences QtCore4.dll error
                            
                                How to denote return type tuple in Google-style Pydoc for Pycharm?
                            
                                Xgboost: what is the difference among bst.best_score, bst.best_iteration and bst.best_ntree_limit?
                            
                                How to return selenium browser (or how to import a def that return selenium browser)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I speed up this Keras Attention computation?

Tags:

python

vectorization

tensorflow

keras

modesitt

People also ask

2 Answers

mr_snuffles

Benedikt Fuchs

Recent Activity

Donate For Us