Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to structure an LSTM neural network for classification

Tags:

I have data that has various conversations between two people. Each sentence has some type of classification. I am attempting to use an NLP net to classify each sentence of the conversation. I tried a convolution net and get decent results (not ground breaking tho). I figured that since this a back and forth conversation, and LSTM net may produce better results, because what was previously said may have a large impact on what follows.

Type of RNN nets

If I follow the structure above, I would assume that I am doing a many-to-many. My data looks like.

X_train = [[sentence 1],  
           [sentence 2],
           [sentence 3]]
Y_train = [[0],
           [1],
           [0]]

Data has been processed using word2vec. I then design my network as follows..

model = Sequential()      
model.add(Embedding(len(vocabulary),embedding_dim,
          input_length=X_train.shape[1]))
model.add(LSTM(88))
model.add(Dense(1,activation='sigmoid'))
model.compile(optimizer='rmsprop',loss='binary_crossentropy',
              metrics['accuracy'])
model.fit(X_train,Y_train,verbose=2,nb_epoch=3,batch_size=15)

I assume that this setup will feed one batch of sentences in at a time. However, if in model.fit, shuffle is not equal to false its receiving shuffled batches, so why is an LSTM net even useful in this case? From research on the subject, to achieve a many-to-many structure one would need to change the LSTM layer too

model.add(LSTM(88,return_sequence=True))

and the output layer would need to be...

model.add(TimeDistributed(Dense(1,activation='sigmoid')))

When switching to this structure I get an error on the input size. I'm unsure of how to reformat the data to meet this requirement, and also how to edit the embedding layer to receive the new data format.

Any input would be greatly appreciated. Or if you have any suggestions on a better method, I am more than happy to hear them!

like image 416
DJK Avatar asked Feb 19 '17 23:02

DJK


People also ask

Can I use LSTM for classification?

To train a deep neural network to classify sequence data, you can use an LSTM network. An LSTM network enables you to input sequence data into a network, and make predictions based on the individual time steps of the sequence data.

How do you use LSTM for sequence classification?

Text classification using LSTM In the modelling, we are making a sequential model. The first layer of the model is the embedding layer which uses the 32 length vector, and the next layer is the LSTM layer which has 100 neurons which will work as the memory unit of the model.

How many layers of LSTM should I use?

Generally, 2 layers have shown to be enough to detect more complex features. More layers can be better but also harder to train. As a general rule of thumb — 1 hidden layer work with simple problems, like this, and two are enough to find reasonably complex features.

Why is LSTM good for classification?

That means, every single word is classified into one of the categories. This is not the same in LSTM. In LSTM we can use a multiple word string to find out the class to which it belongs. This is very helpful while working with Natural language processing.


1 Answers

Your first attempt was good. The shuffling takes place between sentences, the only shuffle the training samples between them so that they don't always come in in the same order. The words inside sentences are not shuffled.

Or maybe I didn't understand the question correctly?

EDIT :

After a better understanding of the question, here is my proposition.

Data preparation : You slice your corpus in blocks of n sentences (they can overlap). You should then have a shape like (number_blocks_of_sentences, n, number_of_words_per_sentence) so basically a list of 2D arrays which contain blocks of n sentences. n shouldn't be too big because LSTM can't handle huge number of elements in the sequence when training (vanishing gradient). Your targets should be an array of shape (number_blocks_of_sentences, n, 1) so also a list of 2D arrays containing the class of each sentence in your block of sentences.

Model :

n_sentences = X_train.shape[1]  # number of sentences in a sample (n)
n_words = X_train.shape[2]      # number of words in a sentence

model = Sequential()
# Reshape the input because Embedding only accepts shape (batch_size, input_length) so we just transform list of sentences in huge list of words
model.add(Reshape((n_sentences * n_words,),input_shape = (n_sentences, n_words)))
# Embedding layer - output shape will be (batch_size, n_sentences * n_words, embedding_dim) so each sample in the batch is a big 2D array of words embedded 
model.add(Embedding(len(vocabaulary), embedding_dim, input_length = n_sentences * n_words ))
# Recreate the sentence shaped array
model.add(Reshape((n_sentences, n_words, embedding_dim))) 
# Encode each sentence - output shape is (batch_size, n_sentences, 88)
model.add(TimeDistributed(LSTM(88)))
# Go over lines and output hidden layer which contains info about previous sentences - output shape is (batch_size, n_sentences, hidden_dim)
model.add(LSTM(hidden_dim, return_sequence=True))
# Predict output binary class - output shape is (batch_size, n_sentences, 1)
model.add(TimeDistributed(Dense(1,activation='sigmoid')))
...

This should be a good start.

I hope this helps

like image 105
Nassim Ben Avatar answered Sep 21 '22 10:09

Nassim Ben