Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GRU/LSTM in Keras with input sequence of varying length

I'm working on a smaller project to better understand RNN, in particualr LSTM and GRU. I'm not at all an expert, so please bear that in mind.

The problem I'm facing is given as data in the form of:

>>> import numpy as np
>>> import pandas as pd
>>> pd.DataFrame([[1, 2, 3],[1, 2, 1], [1, 3, 2],[2, 3, 1],[3, 1, 1],[3, 3, 2],[4, 3, 3]], columns=['person', 'interaction', 'group'])
   person  interaction  group
0       1            2      3
1       1            2      1
2       1            3      2
3       2            3      1
4       3            1      1
5       3            3      2
6       4            3      3

this is just for explanation. We have different person interacting with different groups in different ways. I've already encoded the various features. The last interaction of a user is always a 3, which means selecting a certain group. In the short example above person 1 chooses group 2, person 2 chooses group 1 and so on.

My whole data set is much bigger but I would like to understand first the conceptual part before throwing models at it. The task I would like to learn is given a sequence of interaction, which group is chosen by the person. A bit more concrete, I would like to have an output a list with all groups (there are 3 groups, 1, 2, 3) sorted by the most likely choice, followed by the second and third likest group. The loss function is therefore a mean reciprocal rank.

I know that in Keras Grus/LSTM can handle various length input. So my three questions are.

The input is of the format:

(samples, timesteps, features)

writing high level code:

import keras.layers as L
import keras.models as M
model_input = L.Input(shape=(?, None, 2))

timestep=None should imply the varying size and 2 is for the feature interaction and group. But what about the samples? How do I define the batches?

For the output I'm a bit puzzled how this should look like in this example? I think for each last interaction of a person I would like to have a list of length 3. Assuming I've set up the output

model_output = L.LSTM(3, return_sequences=False)

I then want to compile it. Is there a way of using the mean reciprocal rank?

model.compile('adam', '?')

I know the questions are fairly high level, but I would like to understand first the big picture and start to play around. Any help would therefore be appreciated.

like image 750
math Avatar asked Apr 02 '19 20:04

math


People also ask

What is sequence length in LSTM?

Sequence Length is the length of the sequence of input data (time step:0,1,2… N), the RNN learn the sequential pattern in the dataset.

Is Gru slower than LSTM?

In terms of model training speed, GRU is 29.29% faster than LSTM for processing the same dataset; and in terms of performance, GRU performance will surpass LSTM in the scenario of long text and small dataset, and inferior to LSTM in other scenarios.

What is sequential in LSTM?

A Sequential model is a plain stack of layers where each layer has exactly one input tensor and one output tensor. We are adding LSTM layers in Sequential model via the add() method. model = tf.keras.Sequential() model.add(layers.LSTM(50, activation='relu', input_shape=(n_steps, n_features))) model.add(layers.Dense(1))


1 Answers

The concept you've drawn in your question is a pretty good start already. I'll add a few things to make it work, as well as a code example below:

  • You can specify LSTM(n_hidden, input_shape=(None, 2)) directly, instead of inserting an extra Input layer; the batch dimension is to be omitted for the definition.
  • Since your model is going to perform some kind of classification (based on time series data) the final layer is what we'd expect from "normal" classification as well, a Dense(num_classes, action='softmax'). Chaining the LSTM and the Dense layer together will first pass the time series input through the LSTM layer and then feed its output (determined by the number of hidden units) into the Dense layer. activation='softmax' allows to compute a class score for each class (we're going to use one-hot-encoding in a data preprocessing step, see code example below). This means class scores are not ordered, but you can always do so via np.argsort or np.argmax.
  • Categorical crossentropy loss is suited for comparing the classification score, so we'll use that one: model.compile(loss='categorical_crossentropy', optimizer='adam').
  • Since the number of interactions. i.e. the length of model input, varies from sample to sample we'll use a batch size of 1 and feed in one sample at a time.

The following is a sample implementation w.r.t to the above considerations. Note that I modified your sample data a bit, in order to provide more "reasoning" behind group choices. Also each person needs to perform at least one interaction before choosing a group (i.e. the input sequence cannot be empty); if this is not the case for your data, then introducing an additional no-op interaction (e.g. 0) can help.

import pandas as pd
import tensorflow as tf

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.LSTM(10, input_shape=(None, 2)))  # LSTM for arbitrary length series.
model.add(tf.keras.layers.Dense(3, activation='softmax'))   # Softmax for class probabilities.
model.compile(loss='categorical_crossentropy', optimizer='adam')

# Example interactions:
#   * 1: Likes the group,
#   * 2: Dislikes the group,
#   * 3: Chooses the group.
df = pd.DataFrame([
    [1, 1, 3],
    [1, 1, 3],
    [1, 2, 2],
    [1, 3, 3],
    [2, 2, 1],
    [2, 2, 3],
    [2, 1, 2],
    [2, 3, 2],
    [3, 1, 1],
    [3, 1, 1],
    [3, 1, 1],
    [3, 2, 3],
    [3, 2, 2],
    [3, 3, 1]],
    columns=['person', 'interaction', 'group']
)
data = [person[1][['interaction', 'group']].values for person in df.groupby('person')]
x_train = [x[:-1] for x in data]
y_train = tf.keras.utils.to_categorical([x[-1, 1]-1 for x in data])  # Expects class labels from 0 to n (-> subtract 1).
print(x_train)
print(y_train)

class TrainGenerator(tf.keras.utils.Sequence):
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __len__(self):
        return len(self.x)

    def __getitem__(self, index):
        # Need to expand arrays to have batch size 1.
        return self.x[index][None, :, :], self.y[index][None, :]

model.fit_generator(TrainGenerator(x_train, y_train), epochs=1000)
pred = [model.predict(x[None, :, :]).ravel() for x in x_train]
for p, y in zip(pred, y_train):
    print(p, y)

And the corresponding sample output:

[...]
Epoch 1000/1000
3/3 [==============================] - 0s 40ms/step - loss: 0.0037
[0.00213619 0.00241093 0.9954529 ] [0. 0. 1.]
[0.00123938 0.99718493 0.00157572] [0. 1. 0.]
[9.9632275e-01 7.5039308e-04 2.9268670e-03] [1. 0. 0.]

Using custom generator expressions: According to the documentation we can use any generator to yield the data. The generator is expected to yield batches of the data and loop over the whole data set indefinitely. When using tf.keras.utils.Sequence we do not need to specify the parameter steps_per_epoch as this will default to len(train_generator). Hence, when using a custom generator, we shall provide this parameter as well:

import itertools as it

model.fit_generator(((x_train[i % len(x_train)][None, :, :],
                      y_train[i % len(y_train)][None, :]) for i in it.count()),
                    epochs=1000,
                    steps_per_epoch=len(x_train))
like image 178
a_guest Avatar answered Oct 04 '22 09:10

a_guest