Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to train a LSTM model with different N-dimensions labels?

I am using keras (ver. 2.0.6 with TensorFlow backend) for a simple neural network:

model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=(100, 5)))
model.add(LSTM(32, return_sequences=True)) 
model.add(TimeDistributed(Dense(5)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

It is only a test for me, I am "training" the model with the following dummy data.

x_train = np.array([
    [[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]],
    [[1,0,0,0,0], [0,1,0,0,0], [0,0,1,0,0]],
    [[0,1,0,0,0], [0,0,1,0,0], [0,0,0,1,0]],
    [[0,0,1,0,0], [1,0,0,0,0], [1,0,0,0,0]],
    [[0,0,0,1,0], [0,0,0,0,1], [0,1,0,0,0]],
    [[0,0,0,0,1], [0,0,0,0,1], [0,0,0,0,1]]
])

y_train = np.array([
    [[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]],
    [[1,0,0,0,0], [0,1,0,0,0], [0,0,1,0,0]],
    [[0,1,0,0,0], [0,0,1,0,0], [0,0,0,1,0]],
    [[1,0,0,0,0], [1,0,0,0,0], [1,0,0,0,0]],
    [[1,0,0,0,0], [0,0,0,0,1], [0,1,0,0,0]],
    [[1,0,0,0,0], [0,0,0,0,1], [0,0,0,0,1]]
])

then i do:

model.fit(x_train, y_train, batch_size=2, epochs=50, shuffle=False)

print(model.predict(x_train))

The result is:

[[[ 0.11855114  0.13603994  0.21069065  0.28492314  0.24979511]
  [ 0.03013871  0.04114409  0.16499813  0.41659597  0.34712321]
  [ 0.00194826  0.00351031  0.06993906  0.52274817  0.40185428]]

 [[ 0.17915446  0.19629011  0.21316603  0.22450975  0.18687972]
  [ 0.17935558  0.1994358   0.22070852  0.2309722   0.16952793]
  [ 0.18571526  0.20774922  0.22724937  0.23079531  0.14849086]]

 [[ 0.11163659  0.13263632  0.20109797  0.28029731  0.27433187]
  [ 0.02216373  0.03424517  0.13683401  0.38068131  0.42607573]
  [ 0.00105937  0.0023865   0.0521594   0.43946937  0.50492537]]

 [[ 0.13276921  0.15531689  0.21852671  0.25823513  0.23515201]
  [ 0.05750636  0.08210614  0.22636817  0.3303588   0.30366054]
  [ 0.01128351  0.02332032  0.210263    0.3951444   0.35998878]]

 [[ 0.15303896  0.18197381  0.21823004  0.23647803  0.21027911]
  [ 0.10842207  0.15755147  0.23791778  0.26479205  0.23131666]
  [ 0.06472684  0.12843341  0.26680911  0.28923658  0.25079405]]

 [[ 0.19560908  0.20663913  0.21954383  0.21920268  0.15900527]
  [ 0.22829761  0.22907974  0.22933882  0.20822221  0.10506159]
  [ 0.27179539  0.25587022  0.22594844  0.18308094  0.063305  ]]]

Ok, It works, but it is just a test, i really do not care about accuracy etc. I would like to understand how i can work with output of different size.

For example: passing a sequence (numpy.array) like:

[[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]]

I would like to get 4 dimensions output as prediction:

[[..first..], [..second..], [..third..], [..four..]]

Is that possibile somehow? The size could vary I would train the model with different labels that can have different N-dimensions.

Thanks

like image 605
Dail Avatar asked Aug 16 '17 23:08

Dail


2 Answers

This answer is for non varying dimensions, but for varying dimensions, the padding idea in Giuseppe's answer seems the way to go, maybe with help of the "Masking" proposed in Keras documentation.


The output shape in Keras is totally dependent on the number of "units/neurons/cells" you put in the last layer, and of course, on the type of layer.

I can see that your data does not match your code in your question, it's impossible, but, suppose your code is right and forget the data for a while.

An input shape of (100,5) in an LSTM layer means a tensor of shape (None, 100, 5), which is

  • None is the batch size. The first dimension of your data is reserved to the number of examples you have. (X and Y must have the same number of examples).
  • Each example is a sequence with 100 time steps
  • each time step is a 5-dimension vector.

And the 32 cells in this same LSTM layer means that the resulting vectors will change from 5 to 32-dimension vectors. With return_sequences=True, all the 100 timesteps will appear in the result. So the result shape of the first layer is (None, 100, 32):

  • Same number of examples (this will never change along the model)
  • Still 100 timesteps per example (because return_sequences=True)
  • each time step is a 32-dimension vector (because of 32 cells)

Now the second LSTM layer does exactly the same thing. Keeps the 100 timesteps, and since it has also 32 cells, keeps the 32-dimension vectors, so the output is also (None, 100, 32)

Finally, the time distributed Dense layer will also keep the 100 timesteps (because of TimeDistributed), and change your vectors to 5-dimensoin vectors again (because of 5 units), resulting in (None, 100, 5).


As you can see, you cannot change the number of timesteps directly with recurrent layers, you need to use other layers to change these dimensions. And the way to do this is completely up to you, there are infinite ways of doing this.

But in all of them, you need to get free of the timesteps and rebuild the data with another shape.


Suggestion

A suggestion from me (which is just one possibility) is to reshape your result, and apply another dense layer just to achieve the final shape expeted.

Suppose you want a result like (None, 4, 5) (never forget, the first dimension of your data is the number of examples, it can be any number, but you must take it into account when you organize your data). We can achieve this by reshaping the data to a shape containing 4 in the second dimension:

#after the Dense layer:

model.add(Reshape((4,125)) #the batch size doesn't appear here, 
   #just make sure you have 500 elements, which is 100*5 = 4*125

model.add(TimeDistributed(Dense(5))
#this layer could also be model.add(LSTM(5,return_sequences=True)), for instance

#continue to the "Activation" layer

This will give you 4 timesteps (because the dimension after Reshape was: (None, 4, 125), each step being a 5-dimension vector (because of Dense(5)).

Use the model.summary() command to see the shapes outputted by each layer.

like image 93
Daniel Möller Avatar answered Nov 16 '22 23:11

Daniel Möller


I don't know Keras but from a practical and theoretical point of view this is absolutely possible.

The idea is that you have an input sequence and an output sequence. Commonly, the beginning and the end of each sequence are delimited by some special symbol (e.g. the character sequence "cat" is translated into "^cat#" with an start symbol "^" and an end symbol "#"). Then the sequence is padded with another special symbol, up to a maximum sequence length (e.g. "^cat#$$$$$$" with a padding symbol "$").

If the padding symbol correspond to a zero-vector, it will have no impact on your training.

Your output sequence could now assume any length up to the maximum one, because the real length is the one from the start to the end symbol positions.

In other words, you will have always the same input and output sequence length (i.e. the maximum one), but the real length is that between the start and the end symbols.

(Obviously, in the output sequence, anything after the end symbol should not be considered in the loss function)

like image 2
Giuseppe Marra Avatar answered Nov 16 '22 21:11

Giuseppe Marra