How to structure an LSTM neural network for classification

Tags:

I have data that has various conversations between two people. Each sentence has some type of classification. I am attempting to use an NLP net to classify each sentence of the conversation. I tried a convolution net and get decent results (not ground breaking tho). I figured that since this a back and forth conversation, and LSTM net may produce better results, because what was previously said may have a large impact on what follows.

Type of RNN nets

If I follow the structure above, I would assume that I am doing a many-to-many. My data looks like.

X_train = [[sentence 1],  
           [sentence 2],
           [sentence 3]]
Y_train = [[0],
           [1],
           [0]]

Data has been processed using word2vec. I then design my network as follows..

model = Sequential()      
model.add(Embedding(len(vocabulary),embedding_dim,
          input_length=X_train.shape[1]))
model.add(LSTM(88))
model.add(Dense(1,activation='sigmoid'))
model.compile(optimizer='rmsprop',loss='binary_crossentropy',
              metrics['accuracy'])
model.fit(X_train,Y_train,verbose=2,nb_epoch=3,batch_size=15)

I assume that this setup will feed one batch of sentences in at a time. However, if in model.fit, shuffle is not equal to false its receiving shuffled batches, so why is an LSTM net even useful in this case? From research on the subject, to achieve a many-to-many structure one would need to change the LSTM layer too

model.add(LSTM(88,return_sequence=True))

and the output layer would need to be...

model.add(TimeDistributed(Dense(1,activation='sigmoid')))

When switching to this structure I get an error on the input size. I'm unsure of how to reformat the data to meet this requirement, and also how to edit the embedding layer to receive the new data format.

Any input would be greatly appreciated. Or if you have any suggestions on a better method, I am more than happy to hear them!

416

asked Feb 19 '17 23:02

DJK

1 Answers

Your first attempt was good. The shuffling takes place between sentences, the only shuffle the training samples between them so that they don't always come in in the same order. The words inside sentences are not shuffled.

Or maybe I didn't understand the question correctly?

EDIT :

After a better understanding of the question, here is my proposition.

Data preparation : You slice your corpus in blocks of n sentences (they can overlap). You should then have a shape like (number_blocks_of_sentences, n, number_of_words_per_sentence) so basically a list of 2D arrays which contain blocks of n sentences. n shouldn't be too big because LSTM can't handle huge number of elements in the sequence when training (vanishing gradient). Your targets should be an array of shape (number_blocks_of_sentences, n, 1) so also a list of 2D arrays containing the class of each sentence in your block of sentences.

Model :

n_sentences = X_train.shape[1]  # number of sentences in a sample (n)
n_words = X_train.shape[2]      # number of words in a sentence

model = Sequential()
# Reshape the input because Embedding only accepts shape (batch_size, input_length) so we just transform list of sentences in huge list of words
model.add(Reshape((n_sentences * n_words,),input_shape = (n_sentences, n_words)))
# Embedding layer - output shape will be (batch_size, n_sentences * n_words, embedding_dim) so each sample in the batch is a big 2D array of words embedded 
model.add(Embedding(len(vocabaulary), embedding_dim, input_length = n_sentences * n_words ))
# Recreate the sentence shaped array
model.add(Reshape((n_sentences, n_words, embedding_dim))) 
# Encode each sentence - output shape is (batch_size, n_sentences, 88)
model.add(TimeDistributed(LSTM(88)))
# Go over lines and output hidden layer which contains info about previous sentences - output shape is (batch_size, n_sentences, hidden_dim)
model.add(LSTM(hidden_dim, return_sequence=True))
# Predict output binary class - output shape is (batch_size, n_sentences, 1)
model.add(TimeDistributed(Dense(1,activation='sigmoid')))
...

This should be a good start.

I hope this helps

105

answered Sep 21 '22 10:09

Nassim Ben

Related questions
                            
                                Could not verify the provided CSRF token. Using only xml
                            
                                java.net.SocketTimeoutException: failed to connect to /192.168.1.8 (port 8383) after 10000ms
                            
                                Graceful termination of kubernetes pods
                            
                                barteksc/AndroidPdfViewer massive APK Size
                            
                                In React, what is the right way to update nested array state items
                            
                                Download data from Statistics Sweden. Swedish characters in query
                            
                                multiple $(document).ready and $(window).load in $(document).ready
                            
                                OAuth2 authentication plugin for ServiceStack .NET Core
                            
                                Timber & WPML string translation
                            
                                Matplotlib Indicate Point on X and Y Axis
                            
                                why does `sum` on a Matrix return Matrix, not Vector?
                            
                                Pandoc template: different separator for last item

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to structure an LSTM neural network for classification

Tags:

DJK

People also ask

1 Answers

Nassim Ben

Recent Activity

Donate For Us