Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which classification algorithm to choose?

I would like to classify text documents into four categories. Also I have lot of samples which are already classified that can be used for training. I would like the algorithm to learn on the fly.. please suggest an optimal algorithm that works for this requirement.

like image 776
infotiger Avatar asked Feb 25 '23 02:02

infotiger


2 Answers

If by "on the fly" you mean online learning (where training and classification can be interleaved), I suggest the k-nearest neighbor algorithm. It's available in Weka and in the package TiMBL.

A perceptron will also be able to do this.

"Optimal" isn't a well-defined term in this context.

like image 74
Fred Foo Avatar answered Feb 27 '23 16:02

Fred Foo


there are several algorithms which can be learned on fly. Examples: k-nearest neighbors, naive Bayes, neural networks. You can try how appropriate each of these methods are on a sample corpus.

like image 32
yura Avatar answered Feb 27 '23 17:02

yura