I would like to understand how an RNN, specifically an LSTM is working with multiple input dimensions using Keras and Tensorflow. I mean the input shape is (batch_size, timesteps, input_dim) where input_dim > 1. I think the below images illustrate quite well the concept of LSTM if the input_dim = 1. Does this mean if input_dim > 1 then x is not a single value anymore but an array? But if it's like this then the weights are also become arrays, same shape as x + the context? <img src="https://i.stack.imgur.com/YW2bI.png" alt="LSTM structure"> <img src="https://i.stack.imgur.com/2x8NZ.png" alt="enter image description here">

Keras creates a computational graph that executes the sequence in your bottom picture per feature (but for all units). That means the state value C is always a scalar, one per unit. It does not process features at once, it processes units at once, and features separately. <pre class="prettyprint"><code>import keras.models as kem import keras.layers as kel model = kem.Sequential() lstm = kel.LSTM(units, input_shape=(timesteps, features)) model.add(lstm) model.summary() free_params = (4 * features * units) + (4 * units * units) + (4 * num_units) print('free_params ', free_params) print('kernel_c', lstm.kernel_c.shape) print('bias_c', lstm.bias_c .shape) </code></pre> where <code>4</code> represents one for each of the f, i, c, and o internal paths in your bottom picture. The first term is the number of weights for the kernel, the second term for the recurrent kernel, and the last one for the bias, if applied. For <pre class="prettyprint"><code>units = 1 timesteps = 1 features = 1 </code></pre> we see <pre class="prettyprint"><code>Layer (type) Output Shape Param # ================================================================= lstm_1 (LSTM) (None, 1) 12 ================================================================= Total params: 12.0 Trainable params: 12 Non-trainable params: 0.0 _________________________________________________________________ num_params 12 kernel_c (1, 1) bias_c (1,) </code></pre> and for <pre class="prettyprint"><code>units = 1 timesteps = 1 features = 2 </code></pre> we see <pre class="prettyprint"><code>Layer (type) Output Shape Param # ================================================================= lstm_1 (LSTM) (None, 1) 16 ================================================================= Total params: 16.0 Trainable params: 16 Non-trainable params: 0.0 _________________________________________________________________ num_params 16 kernel_c (2, 1) bias_c (1,) </code></pre> where <code>bias_c</code> is a proxy for the output shape of the state C. Note that there are different implementations regarding the internal making of the unit. Details are here (http://deeplearning.net/tutorial/lstm.html) and the default implementation uses Eq.7. Hope this helps.

Multi dimensional input for LSTM in Keras

Tags:

machine-learning

neural-network

keras

recurrent-neural-network

I would like to understand how an RNN, specifically an LSTM is working with multiple input dimensions using Keras and Tensorflow. I mean the input shape is (batch_size, timesteps, input_dim) where input_dim > 1.
I think the below images illustrate quite well the concept of LSTM if the input_dim = 1.
Does this mean if input_dim > 1 then x is not a single value anymore but an array? But if it's like this then the weights are also become arrays, same shape as x + the context?

LSTM structure

enter image description here

844

asked May 01 '17 17:05

Manngo

1 Answers

Keras creates a computational graph that executes the sequence in your bottom picture per feature (but for all units). That means the state value C is always a scalar, one per unit. It does not process features at once, it processes units at once, and features separately.

import keras.models as kem
import keras.layers as kel

model = kem.Sequential()
lstm = kel.LSTM(units, input_shape=(timesteps, features))
model.add(lstm)
model.summary()

free_params = (4 * features * units) + (4 * units * units) + (4 * num_units)
print('free_params ', free_params)
print('kernel_c', lstm.kernel_c.shape)
print('bias_c', lstm.bias_c .shape)

where 4 represents one for each of the f, i, c, and o internal paths in your bottom picture. The first term is the number of weights for the kernel, the second term for the recurrent kernel, and the last one for the bias, if applied. For

units = 1
timesteps = 1
features = 1

we see

Layer (type)                 Output Shape              Param #
=================================================================
lstm_1 (LSTM)                (None, 1)                 12
=================================================================
Total params: 12.0
Trainable params: 12
Non-trainable params: 0.0
_________________________________________________________________
num_params 12
kernel_c (1, 1)
bias_c (1,)

and for

units = 1
timesteps = 1
features = 2

we see

Layer (type)                 Output Shape              Param #
=================================================================
lstm_1 (LSTM)                (None, 1)                 16
=================================================================
Total params: 16.0
Trainable params: 16
Non-trainable params: 0.0
_________________________________________________________________
num_params 16
kernel_c (2, 1)
bias_c (1,)

where bias_c is a proxy for the output shape of the state C. Note that there are different implementations regarding the internal making of the unit. Details are here (http://deeplearning.net/tutorial/lstm.html) and the default implementation uses Eq.7. Hope this helps.

190

answered Sep 29 '22 16:09

PSL

Related questions
                            
                                Object of type 'ndarray' is not JSON serializable
                            
                                sklearn.compose.ColumnTransformer: fit_transform() takes 2 positional arguments but 3 were given
                            
                                Scikit-learn GridSearch giving "ValueError: multiclass format is not supported" error
                            
                                Insert result of sklearn CountVectorizer in a pandas dataframe
                            
                                RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1)
                            
                                Weak Classifier
                            
                                Keras - How to perform a prediction using KerasRegressor?
                            
                                How do I pass a scalar via a TensorFlow feed dictionary
                            
                                Fine Tuning of GoogLeNet Model
                            
                                TensorFlow: Saver has 5 models limit
                            
                                How do you alter the size of a Pytorch Dataset?
                            
                                How to re-partition pyspark dataframe?
                            
                                Is it possible to visualize a tensorflow graph without a training op?
                            
                                Why vector normalization can improve the accuracy of clustering and classification?
                            
                                TypeError: Expected float32 passed to parameter 'y' of op 'Equal', got 'auto' of type 'str' instead
                            
                                RuntimeError: b'no arguments in initialization list'
                            
                                Very simple text classification by machine learning? [duplicate]
                            
                                OpenCV machine learning functions want CvFileStorage* instead of cv::FileStorage*

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With