LSTM Model in Keras with Auxiliary Inputs

Tags:

I have a dataset with 2 columns - Each column contains a set of documents. I have to match the document in Col A with documents provided in Col B. This is a supervised classification problem. So my training data contains a label column indicating whether the documents match or not.

To solve the problem, I have a created a set of features, say f1-f25 (by comparing the 2 documents) and then trained a binary classifier on these features. This approach works reasonably well, but now I would like to evaluate Deep Learning models on this problem (specifically, LSTM models).

I am using the keras library in Python. After going through the keras documentation and other tutorials available online, I have managed to do the following:

from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model

# Each document contains a series of 200 words 
# The necessary text pre-processing steps have been completed to transform  
  each doc to a fixed length seq
main_input1 = Input(shape=(200,), dtype='int32', name='main_input1')
main_input2 = Input(shape=(200,), dtype='int32', name='main_input2')

# Next I add a word embedding layer (embed_matrix is separately created    
for each word in my vocabulary by reading from a pre-trained embedding model)
x = Embedding(output_dim=300, input_dim=20000, 
input_length=200, weights = [embed_matrix])(main_input1)
y = Embedding(output_dim=300, input_dim=20000, 
input_length=200, weights = [embed_matrix])(main_input2)

# Next separately pass each layer thru a lstm layer to transform seq of   
vectors into a single sequence
lstm_out_x1 = LSTM(32)(x)
lstm_out_x2 = LSTM(32)(y)

# concatenate the 2 layers and stack a dense layer on top
x = keras.layers.concatenate([lstm_out_x1, lstm_out_x2])
x = Dense(64, activation='relu')(x)
# generate intermediate output
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(x)

# add auxiliary input - auxiliary inputs contains 25 features for each document pair
auxiliary_input = Input(shape=(25,), name='aux_input')

# merge aux output with aux input and stack dense layer on top
main_input = keras.layers.concatenate([auxiliary_output, auxiliary_input])
x = Dense(64, activation='relu')(main_input)
x = Dense(64, activation='relu')(x)

# finally add the main output layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)

model = Model(inputs=[main_input1, main_input2, auxiliary_input], outputs= main_output)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit([x1, x2,aux_input], y,
      epochs=3, batch_size=32)

However, when I score this on the training data, I get the same prob. score for all cases. The issue seems to be with the way auxiliary input is fed in (because it generates meaningful output when I remove the aux. input). I also tried inserting the auxiliary input at different places in the network. But somehow I couldnt get this to work.

Any pointers?

248

asked May 07 '17 07:05

Dataminer

1 Answers

Well, this is open for several months and people are voting it up.
I did something very similar recently using this dataset that can be used to forecast credit card defaults and it contains categorical data of customers (gender, education level, marriage status etc.) as well as payment history as time series. So I had to merge time series with non-series data. My solution was very similar to yours by combining LSTM with a dense, I try to adopt the approach to your problem. What worked for me is dense layer(s) on the auxiliary input.

Furthermore in your case a shared layer would make sense so the same weights are used to "read" both documents. My proposal for testing on your data:

from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model

# Each document contains a series of 200 words 
# The necessary text pre-processing steps have been completed to transform  
  each doc to a fixed length seq
main_input1 = Input(shape=(200,), dtype='int32', name='main_input1')
main_input2 = Input(shape=(200,), dtype='int32', name='main_input2')

# Next I add a word embedding layer (embed_matrix is separately created    
for each word in my vocabulary by reading from a pre-trained embedding model)
x1 = Embedding(output_dim=300, input_dim=20000, 
input_length=200, weights = [embed_matrix])(main_input1)
x2 = Embedding(output_dim=300, input_dim=20000, 
input_length=200, weights = [embed_matrix])(main_input2)

# Next separately pass each layer thru a lstm layer to transform seq of   
vectors into a single sequence
# Comment Manngo: Here I changed to shared layer
# Also renamed y as input as it was confusing
# Now x and y are x1 and x2
lstm_reader = LSTM(32)
lstm_out_x1 = lstm_reader(x1)
lstm_out_x2 = lstm_reader(x2)

# concatenate the 2 layers and stack a dense layer on top
x = keras.layers.concatenate([lstm_out_x1, lstm_out_x2])
x = Dense(64, activation='relu')(x)
x = Dense(32, activation='relu')(x)
# generate intermediate output
# Comment Manngo: This is created as a dead-end
# It will not be used as an input of any layers below
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(x)

# add auxiliary input - auxiliary inputs contains 25 features for each document pair
# Comment Manngo: Dense branch on the comparison features
auxiliary_input = Input(shape=(25,), name='aux_input')
auxiliary_input = Dense(64, activation='relu')(auxiliary_input)
auxiliary_input = Dense(32, activation='relu')(auxiliary_input)

# OLD: merge aux output with aux input and stack dense layer on top
# Comment Manngo: actually this is merging the aux output preparation dense with the aux input processing dense
main_input = keras.layers.concatenate([x, auxiliary_input])
main = Dense(64, activation='relu')(main_input)
main = Dense(64, activation='relu')(main)

# finally add the main output layer
main_output = Dense(1, activation='sigmoid', name='main_output')(main)

# Compile
# Comment Manngo: also define weighting of outputs, main as 1, auxiliary as 0.5
model.compile(optimizer=adam,
              loss={'main_output': 'w_binary_crossentropy', 'aux_output': 'binary_crossentropy'},
              loss_weights={'main_output': 1.,'auxiliary_output': 0.5},
              metrics=['accuracy'])

# Train model on main_output and on auxiliary_output as a support
# Comment Manngo: Unknown information marked with placeholders ____
# We have 3 inputs: x1 and x2: the 2 strings
# aux_in: the 25 features
# We have 2 outputs: main and auxiliary; both have the same targets -> (binary)y


model.fit({'main_input1': __x1__, 'main_input2': __x2__, 'auxiliary_input' : __aux_in__}, {'main_output': __y__, 'auxiliary_output': __y__}, 
              epochs=1000, 
              batch_size=__, 
              validation_split=0.1, 
              callbacks=[____])

I don't know how much this can help since I don't have your data so I can't try. Nevertheless this is my best shot.
I didn't run the above code for obvious reasons.

165

answered Oct 20 '22 17:10

Manngo

Related questions
                            
                                How to do zero padding in keras conv layer?
                            
                                keras bidirectional lstm seq2seq
                            
                                Input images with dynamic dimensions in Tensorflow-lite
                            
                                TypeError: __call__() missing 1 required positional argument: 'inputs'
                            
                                Tensorboard AttributeError: 'ModelCheckpoint' object has no attribute 'on_train_batch_begin'
                            
                                softmax and sigmoid function for the output layer
                            
                                Keras | TypeError: __init__() missing 1 required positional argument: 'nb_col'
                            
                                Keras callbacks keep skip saving checkpoints, claiming val_acc is missing
                            
                                OSError: broken data stream when reading image file
                            
                                Keras Dense layer's input is not flattened
                            
                                Why am i getting AttributeError: 'KerasClassifier' object has no attribute 'model'?
                            
                                How to choose LSTM Keras parameters?
                            
                                How to "Merge" Sequential models in Keras 2.0?
                            
                                What is the difference between tf.keras.layers versus tf.layers?
                            
                                ValueError with Concatenate Layer (Keras functional API)
                            
                                Keras: convert pretrained weights between theano and tensorflow
                            
                                actor critic policy loss going to zero (with no improvement)
                            
                                How to properly set steps_per_epoch and validation_steps in Keras?
                            
                                How to convert Keras .h5 model to darknet yolo.weights format?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

LSTM Model in Keras with Auxiliary Inputs

Tags:

keras

keras-layer

Dataminer

People also ask

1 Answers

Manngo

Recent Activity

Donate For Us