Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mahout Classifier v. OpenNLP Documentclassifier

Tags:

mahout

opennlp

I'm at a cross roads, ive been using Mahout to classify some documents, and have stumbled across OpenNLP document classifier.

They seem to do very similar things, and i cant figure out if its worth converting what I currently have written in mahout, and provide an OpenNLP implementation instead.

Are there some blatently obvious advantages mahout has over OpenNLP for document classification?

My situation is that I have several hundred thousand news articles, and i only want to extract a subset of them. Mahout does this reasonably well, - im using Naive Bayes for term counting, and then TF-IDF to determine which category the documents fall into. The model is updated as and when new articles are found, so the model is consistently improving over time.

It seems OpenNLP document classifier does something very similar (although i have not tested how accurate it is). - does anyone have experience using both, who can say diffentively why one would be used above the other?

like image 819
andrew.butkus Avatar asked Mar 21 '26 10:03

andrew.butkus


1 Answers

I don't have experience with these two, but while trying to figure out if one of them would make a difference in a personal project, I stumbled upon this blog, and I quote:

Data categorization with OpenNLP is another approach with more accuracy and performance rate as compared to mahout.

You can check the blog post here.

like image 160
Coz Avatar answered Mar 24 '26 22:03

Coz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!