I am reading this article (The Unreasonable Effectiveness of Recurrent Neural Networks) and want to understand how to express one-to-one, one-to-many, many-to-one, and many-to-many LSTM neural networks in Keras. I have read a lot about RNN and understand how LSTM NNs work, in particular vanishing gradient, LSTM cells, their outputs and states, sequence output and etc. However, I have trouble expressing all these concepts in Keras.
To start with I have created the following toy NN using LSTM layer
from keras.models import Model
from keras.layers import Input, LSTM
import numpy as np
t1 = Input(shape=(2, 3))
t2 = LSTM(1)(t1)
model = Model(inputs=t1, outputs=t2)
inp = np.array([[[1,2,3],[4,5,6]]])
model.predict(inp)
Output:
array([[ 0.0264638]], dtype=float32)
In my example I have the input shape 2 by 3. As far as I understand this means that the input is a sequence of 2 vectors and each vector has 3 features and hence my input must be a 3D tensor of shape (n_examples, 2, 3)
. In terms of 'sequences', the input is a sequence of length 2, and each element in the sequence is expressed by 3 features (please correct me if I am wrong). When I call predict
it returns a 2-dim tensor with a single scalar. So,
Q1: Is it one-to-one or another type of LSTM network?
When we say "one/many input and one/many output"
Q2: what do we mean by "one/many input/output"? A "one/many" scalar(s), vector(s), sequence(s)..., one/many what?
Q3: Can someone give a simple working example in Keras for each type of the networks: 1-1, 1-M, M-1, and M-M?
PS: I ask multiple questions in a single thread since they are very close and related to each other.
The distinction one-to-one, one-to-many, many-to-one, many-to-many is only existent in case of RNN / LSTM or networks that work on sequential ( temporal ) data, CNNs work on spatial data there this distinction does not exist. So many always involves multiple timesteps / a sequence
The different species describe the shape of input and output and its classification. For the input one means a single input quantity is classified as a closed quantity and many means a sequence of quantities ( i.e. sequence of images, sequence of words) is classified as a closed quantity. For the output one means the output is a scalar ( binary classification i.e. is a bird or is not a bird ) 0
or 1
, many means output is a one-hot encoded vector with one dimension for each class ( multiclass classification i.e. is a sparrow, is a robin, ... ), for i.e. three classes 001, 010, 100
:
In the following example images and sequences of images are used as quantity that shall be classified, alternatively you could use words or ... and sequences of words ( sentences ) or ... :
one-to-one : single images ( or words,... ) are classified in single class ( binary classification ) i.e. is this a bird or not
one-to-many : single images ( or words,... ) are classified in multiple classes
many-to-one : sequence of images ( or words, ... ) is classified in single class ( binary classification of a sequence )
many-to-many : sequence of images ( or words, ... ) is classified in multiple classes
cf https://www.quora.com/How-can-I-choose-between-one-to-one-one-to-many-many-to-one-many-to-one-and-many-to-many-in-long-short-term-memory-LSTM
one-to-one ( activation=sigmoid
( default ) loss=mean_squared_error
)
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
# prepare sequence
length = 5
seq = array([i/float(length) for i in range(length)])
X = seq.reshape(len(seq), 1, 1)
y = seq.reshape(len(seq), 1)
# define LSTM configuration
n_neurons = length
n_batch = length
n_epoch = 1000
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(1, 1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)
# evaluate
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result:
print('%.1f' % value)
source : https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/
one-to-many uses RepeatVector()
to transform single quantities into a sequence what is needed for multiclass classification
def test_one_to_many(self):
params = dict(
input_dims=[1, 10], activation='tanh',
return_sequences=False, output_dim=3
),
number_of_times = 4
model = Sequential()
model.add(RepeatVector(number_of_times, input_shape=(10,)))
model.add(LSTM(output_dim=params[0]['output_dim'],
activation=params[0]['activation'],
inner_activation='sigmoid',
return_sequences=True,
))
relative_error, keras_preds, coreml_preds = simple_model_eval(params, model)
# print relative_error, '\n', keras_preds, '\n', coreml_preds, '\n'
for i in range(len(relative_error)):
self.assertLessEqual(relative_error[i], 0.01)
source: https://www.programcreek.com/python/example/89689/keras.layers.RepeatVector
alternative one-to-many
model.add(RepeatVector(number_of_times, input_shape=input_shape))
model.add(LSTM(output_size, return_sequences=True))
source : Many to one and many to many LSTM examples in Keras
many-to-one, binary classification (loss=binary_crossentropy
, activation=sigmoid
, dimensionality of fully-connected ouput layer is 1 (Dense(1)
), outputs a scalar, 0
or 1
)
model = Sequential()
model.add(Embedding(5000, 32, input_length=500))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, y_train, epochs=3, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
many-to-many, multiclass classification ( loss=sparse_categorial_crossentropy
, activation=softmax
, needs one-hot encoding of target, ground truth data, dimensionality of fully-connected ouput layer is 7 (Dense71)
) outputs a 7-dimensional vector in that the 7 classes are one-hot encoded )
from keras.models import Sequential
from keras.layers import *
model = Sequential()
model.add(Embedding(5000, 32, input_length=500))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(7, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
cf Keras LSTM multiclass classification
Alternative many-to-many using TimeDistributed
layer cf https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/ for description
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import TimeDistributed
from keras.layers import LSTM
# prepare sequence
length = 5
seq = array([i/float(length) for i in range(length)])
X = seq.reshape(1, length, 1)
y = seq.reshape(1, length, 1)
# define LSTM configuration
n_neurons = length
n_batch = 1
n_epoch = 1000
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(length, 1), return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)
# evaluate
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result[0,:,0]:
print('%.1f' % value)
source : https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With