Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I mask the padding in a BLSTM in Keras?

I'm running a BLSTM based off of the IMDB example, but my version is not classification, rather sequence prediction for labels. For simplicity, you can treat it as a POS tagging model. Inputs are sentences of words, outputs are tags. The syntax used in the example differs slightly in syntax from most other Keras examples in that it doesn't use model.add but initiates a sequence. I can't figure out how to add a masking layer in this slightly different syntax.

I've run the model and tested it, and it works fine but it's predicting and evaluating the accuracy of the 0's, which are my padding. Here's the code:

from __future__ import print_function
import numpy as np
from keras.preprocessing import sequence
from keras.models import Model
from keras.layers.core import Masking
from keras.layers import TimeDistributed, Dense
from keras.layers import Dropout, Embedding, LSTM, Input, merge
from prep_nn import prep_scan
from keras.utils import np_utils, generic_utils


np.random.seed(1337)  # for reproducibility
nb_words = 20000  # max. size of vocab
nb_classes = 10  # number of labels
hidden = 500  # 500 gives best results so far
batch_size = 10  # create and update net after 10 lines
val_split = .1
epochs = 15

# input for X is multi-dimensional numpy array with IDs,
# one line per array. input y is multi-dimensional numpy array with
# binary arrays for each value of each label.
# maxlen is length of longest line
print('Loading data...')
(X_train, y_train), (X_test, y_test) = prep_scan(
    nb_words=nb_words, test_len=75)

print(len(X_train), 'train sequences')
print(int(len(X_train)*val_split), 'validation sequences')
print(len(X_test), 'heldout sequences')

# this is the placeholder tensor for the input sequences
sequence = Input(shape=(maxlen,), dtype='int32')

# this embedding layer will transform the sequences of integers
# into vectors
embedded = Embedding(nb_words, output_dim=hidden,
                     input_length=maxlen)(sequence)

# apply forwards LSTM
forwards = LSTM(output_dim=hidden, return_sequences=True)(embedded)
# apply backwards LSTM
backwards = LSTM(output_dim=hidden, return_sequences=True,
                 go_backwards=True)(embedded)

# concatenate the outputs of the 2 LSTMs
merged = merge([forwards, backwards], mode='concat', concat_axis=-1)
after_dp = Dropout(0.15)(merged)

# TimeDistributed for sequence
# change activation to sigmoid?
output = TimeDistributed(
    Dense(output_dim=nb_classes,
          activation='softmax'))(after_dp)

model = Model(input=sequence, output=output)

# try using different optimizers and different optimizer configs
# loss=binary_crossentropy, optimizer=rmsprop
model.compile(loss='categorical_crossentropy',
              metrics=['accuracy'], optimizer='adam')

print('Train...')
model.fit(X_train, y_train,
          batch_size=batch_size,
          nb_epoch=epochs,
          shuffle=True,
          validation_split=val_split)

UPDATE:

I merged this PR and got it working with mask_zero=True in the embedding layer. But I'm realizing now after seeing terrible performance of the model I'd also need masking in the output, others have suggested to use sample_weight instead in the model.fit line. How could I do this to ignore 0s?

UPDATE 2:

So I read this and figured out the sample_weight as a matrix of 1s and 0s. I thought it may have been working, but my accuracy stalls around %50, and I just found it it's trying to predict the padded parts, but won't predict them as 0 now, as was the problem before using sample_weight.

Current code:

from __future__ import print_function
import numpy as np
from keras.preprocessing import sequence
from keras.models import Model
from keras.layers.core import Masking
from keras.layers import TimeDistributed, Dense
from keras.layers import Dropout, Embedding, LSTM, Input, merge
from prep_nn import prep_scan
from keras.utils import np_utils, generic_utils
import itertools
from itertools import chain
from sklearn.preprocessing import LabelBinarizer
import sklearn
import pandas as pd


np.random.seed(1337)  # for reproducibility
nb_words = 20000  # max. size of vocab
nb_classes = 10  # number of labels
hidden = 500  # 500 gives best results so far
batch_size = 10  # create and update net after 10 lines
val_split = .1
epochs = 10

# input for X is multi-dimensional numpy array with syll IDs,
# one line per array. input y is multi-dimensional numpy array with
# binary arrays for each value of each label.
# maxlen is length of longest line
print('Loading data...')
(X_train, y_train), (X_test, y_test), maxlen, sylls_ids, tags_ids, weights = prep_scan(nb_words=nb_words, test_len=75)

print(len(X_train), 'train sequences')
print(int(len(X_train) * val_split), 'validation sequences')
print(len(X_test), 'heldout sequences')

# this is the placeholder tensor for the input sequences
sequence = Input(shape=(maxlen,), dtype='int32')

# this embedding layer will transform the sequences of integers
# into vectors of size 256
embedded = Embedding(nb_words, output_dim=hidden,
                     input_length=maxlen, mask_zero=True)(sequence)

# apply forwards LSTM
forwards = LSTM(output_dim=hidden, return_sequences=True)(embedded)
# apply backwards LSTM
backwards = LSTM(output_dim=hidden, return_sequences=True,
                 go_backwards=True)(embedded)

# concatenate the outputs of the 2 LSTMs
merged = merge([forwards, backwards], mode='concat', concat_axis=-1)
# after_dp = Dropout(0.)(merged)

# TimeDistributed for sequence
# change activation to sigmoid?
output = TimeDistributed(
    Dense(output_dim=nb_classes,
          activation='softmax'))(merged)

model = Model(input=sequence, output=output)

# try using different optimizers and different optimizer configs
# loss=binary_crossentropy, optimizer=rmsprop
model.compile(loss='categorical_crossentropy',
              metrics=['accuracy'], optimizer='adam',
              sample_weight_mode='temporal')

print('Train...')
model.fit(X_train, y_train,
          batch_size=batch_size,
          nb_epoch=epochs,
          shuffle=True,
          validation_split=val_split,
          sample_weight=weights)
like image 677
ChrisDH Avatar asked Jun 14 '16 16:06

ChrisDH


Video Answer


1 Answers

Did you solve this issue? It is not very clear to me how your code deals with padded values and word indexes. What about letting word indexes start from 1 and defining

embedded = Embedding(nb_words + 1, output_dim=hidden,
                 input_length=maxlen, mask_zero=True)(sequence)

instead of

embedded = Embedding(nb_words, output_dim=hidden,
                 input_length=maxlen, mask_zero=True)(sequence)

according to https://keras.io/layers/embeddings/?

like image 169
ChiaraM Avatar answered Oct 01 '22 20:10

ChiaraM