Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with length variations for text classification using CNN (Keras)

It has been proved that CNN (convolutional neural network) is quite useful for text/document classification. I wonder how to deal with the length differences as the lengths of articles are different in most cases. Are there any examples in Keras? Thanks!!

like image 724
Fiong Avatar asked Jun 02 '16 01:06

Fiong


People also ask

Are CNN good for text classification?

Based on the above characterization, it makes sense to choose a CNN for classification tasks like sentiment classification since sentiment is usually determined by some key phrases and to choose RNNs for a sequence modeling task like language modeling or machine translation or image captioning as it requires flexible ...

Is CNN better than LSTM for text classification?

Convolutional neural network (CNN) models use convolutional layers and maximum pooling or max-overtime pooling layers to extract higher-level features, while LSTM models can capture long-term dependencies between word sequences hence are better used for text classification.

Which algorithm is best for multiclass text classification?

Linear Support Vector Machine is widely regarded as one of the best text classification algorithms. We achieve a higher accuracy score of 79% which is 5% improvement over Naive Bayes.


1 Answers

Here are three options:

  1. Crop the longer articles.
  2. Pad the shorter articles.
  3. Use a recurrent neural network, which naturally supports variable-length inputs.
like image 114
1'' Avatar answered Oct 20 '22 23:10

1''