I would like to classify text documents into four categories. Also I have lot of samples which are already classified that can be used for training. I would like the algorithm to learn on the fly.. please suggest an optimal algorithm that works for this requirement.
If by "on the fly" you mean online learning (where training and classification can be interleaved), I suggest the k-nearest neighbor algorithm. It's available in Weka and in the package TiMBL.
A perceptron will also be able to do this.
"Optimal" isn't a well-defined term in this context.
there are several algorithms which can be learned on fly. Examples: k-nearest neighbors, naive Bayes, neural networks. You can try how appropriate each of these methods are on a sample corpus.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With