Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Categorizing records in Java

Tags:

java

I've had a list of books in which each book belongs to a category.

  • Flying a Plane - Aviation
  • Painting a picture - Art
  • 1001 Recipes - Cooking

I have a huge enough sample set of data. I need to categorize my newer books using some algorithm. I know it'll never be a 100% accurate but a good guess is good for me.

What should I use to implement to do something like this? Should I go with Classifier4J and it's Vector Classifier?

Are there other tools that I should look at like Weka? It would be great if someone could point me to some articles/examples to get me started.

Thanks

like image 254
Mridang Agarwalla Avatar asked Nov 03 '22 22:11

Mridang Agarwalla


2 Answers

There's a course on https://www.coursera.org/course/ml called Machine Learning. If you look at your problem as classification you should train N One-vs-All classifiers where N is number of your classes (=categories). To train a classifier use on of algorithms described in Natural Language Processing class https://www.coursera.org/course/nlp, normally it will be similarity to existing classes http://nlp.stanford.edu/IR-book/html/htmledition/text-classification-and-naive-bayes-1.html. All this could be done in Apache Mahout with https://cwiki.apache.org/confluence/display/MAHOUT/Bayesian.

like image 98
Artem Oboturov Avatar answered Nov 12 '22 11:11

Artem Oboturov


Lingpipe seems to be a good solution and seems to work well. The included demo in Lingpipe is a good place to begin:

http://alias-i.com/lingpipe/demos/tutorial/classify/read-me.html

like image 25
Mridang Agarwalla Avatar answered Nov 12 '22 11:11

Mridang Agarwalla