Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sequence labeling in Keras

I'm working on sentence labeling problem. I've done embedding and padding by myself and my inputs look like:

X_i = [[0,1,1,0,2,3...], [0,1,1,0,2,3...], ..., [0,0,0,0,0...],  [0,0,0,0,0...], ....]

For every word in sentence I want to predict one of four classes, so my desired output should look like:

Y_i = [[1,0,0,0], [0,0,1,0], [0,1,0,0], ...]

My simple network architecture is:

model = Sequential()

model.add(LSTM(input_shape = (emb,),input_dim=emb, output_dim=hidden, return_sequences=True))
model.add(TimeDistributedDense(output_dim=4))
model.add(Activation('softmax'))
    model.compile(loss='binary_crossentropy', optimizer='adam')

model.fit(X_train, Y_train, batch_size=32, nb_epoch=3, validation_data=(X_test, Y_test), verbose=1, show_accuracy=True)

It shows approximately 95% while training, but when I'm trying to predict new sentences using trained model results are really bad. It looks like model just learnt some classes for first words and shows it every time. I think the problem can is:

  1. Written by myself padding (zero vectors in the end of the sentence), can it make learning worse?

  2. I should try to learn sentences of different length, without padding (if yes, can you help me how train such kind of a model in Keras?)

  3. Wrong objective of learning, but I tried mean squared error, binary cross entropy and others, it doesn't change.

  4. Something with TimeDistributedDense and softmax, I think, that I've got how it works, but still not 100% sure.

I'll be glad to see any hint or help regarding to this problem, thank you!

like image 814
Rachnog Avatar asked Nov 09 '22 21:11

Rachnog


1 Answers

I personally think that you misunderstand what "sequence labeling" means.

Do you mean:

  1. X is a list of sentences, each element X[i] is a word sequence of arbitrary length?
  2. Y[i] is the category of X[i], and the one hot form of Y[i] is a [0, 1, 0, 0] like array?

If it is, then it's not a sequence labeling problem, it's a classification problem.

Don't use TimeDistributedDense, and if it is a multi-class classification problem, i.e., len(Y[i]) > 2, then use "categorical_crossentropy" instead of "binary_crossentropy"

like image 66
James Avatar answered Nov 26 '22 11:11

James