I've had a list of books in which each book belongs to a category.
I have a huge enough sample set of data. I need to categorize my newer books using some algorithm. I know it'll never be a 100% accurate but a good guess is good for me.
What should I use to implement to do something like this? Should I go with Classifier4J and it's Vector Classifier?
Are there other tools that I should look at like Weka? It would be great if someone could point me to some articles/examples to get me started.
Thanks
There's a course on https://www.coursera.org/course/ml called Machine Learning. If you look at your problem as classification you should train N
One-vs-All classifiers where N
is number of your classes (=categories). To train a classifier use on of algorithms described in Natural Language Processing class https://www.coursera.org/course/nlp, normally it will be similarity to existing classes http://nlp.stanford.edu/IR-book/html/htmledition/text-classification-and-naive-bayes-1.html. All this could be done in Apache Mahout with https://cwiki.apache.org/confluence/display/MAHOUT/Bayesian.
Lingpipe seems to be a good solution and seems to work well. The included demo in Lingpipe is a good place to begin:
http://alias-i.com/lingpipe/demos/tutorial/classify/read-me.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With