Categorizing records in Java

Question

I've had a list of books in which each book belongs to a category.

Flying a Plane - Aviation
Painting a picture - Art
1001 Recipes - Cooking

I have a huge enough sample set of data. I need to categorize my newer books using some algorithm. I know it'll never be a 100% accurate but a good guess is good for me.

What should I use to implement to do something like this? Should I go with Classifier4J and it's Vector Classifier?

Are there other tools that I should look at like Weka? It would be great if someone could point me to some articles/examples to get me started.

Thanks

Artem Oboturov · Accepted Answer

There's a course on https://www.coursera.org/course/ml called Machine Learning. If you look at your problem as classification you should train N One-vs-All classifiers where N is number of your classes (=categories). To train a classifier use on of algorithms described in Natural Language Processing class https://www.coursera.org/course/nlp, normally it will be similarity to existing classes http://nlp.stanford.edu/IR-book/html/htmledition/text-classification-and-naive-bayes-1.html. All this could be done in Apache Mahout with https://cwiki.apache.org/confluence/display/MAHOUT/Bayesian.

Mridang Agarwalla · Answer

Lingpipe seems to be a good solution and seems to work well. The included demo in Lingpipe is a good place to begin:

http://alias-i.com/lingpipe/demos/tutorial/classify/read-me.html

Categorizing records in Java

Tags:

java

Mridang Agarwalla

2 Answers

Artem Oboturov

Mridang Agarwalla

Recent Activity

Donate For Us

Categorizing records in Java

Tags:

java

Mridang Agarwalla

2 Answers

Artem Oboturov

Mridang Agarwalla

Related questions

Recent Activity

Donate For Us