Which product (Mallet or Weka) is better for text classification task:
I'm new for this problem so any comments will be great
MALLET is much easier to use and does most of its job invisibly. You don't have to convert the format of anything either, you just give it text files and it gives you back results.
Weka requires converting the text into a particular format (the Weka script for doing so it so slow and inefficient that I would recommend you write your own).
The problem with MALLET is that the training uses GB of memory and it can take hours, if you have large training sets.
Weka has more documentation, but most of it makes no sense. MALLET has very little documentation but is very simple to use.
To be honest, after testing the both of them, I opted for writing my own classifier.
I'm really enjoying Weka vs Mallet. Maybe I don't know enough yet, but doing machine learning with a GUI is awesome. You can tweak parameters and run different experiments (keeping the results of past experiments in front of you, too) very easily. I'm new to Weka, so this is FWIW.
As far as which one is simpler to train, I find Weka simpler. I don't know what kind of control you can have over your feature space by just pointing Mallet at some text (maybe it's good enough), but my experience with Mallet was comparable to Weka... writing scripts to get the input in the proper format, with the caveat that I had to do multiple steps to utilize some kind of serialized version of the data in Mallet.
Regarding your other questions, I can't really answer them right now, but am hoping this answer doesn't get downvoted 'cause it's good information to be out there, anyway.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With