I've started to write a simple sentiment analysis tool.
Currently I am looking at GATE and RapidMiner but being a beginner not able to concentrate on both.
Could someone please tell me which one will be better in terms of usage, learning curve, licensing etc?
I vote for RapidMiner for three reasons, and I have used them both:
- RapidMiner's GUI interface makes things much smoother - it has been well-designed.
- You can use plug-ins in RapidMiner that have a ton of back-end power, like R and Weka - these make the system far more versatile than GATE for statistics and data mining work.
- RapidMiner has a pretty good support network. I definitely recommend looking at the Vancouver Data link above, because the things that Neil does with text completely blew my mind - so I went and used his methods. They worked like a charm!
- RapidMiner can be deployed as a server, which means that you can really crunch the numbers and data when you need to. There isn't a desktop-only limitation.
That said, here are a few things about GATE:
- GATE probably has a better Semantic understanding of text, and the built-in vocabularies are pretty extensive.
- The GATE system is mature and well-developed, and is continuing to be developed.
- GATE can handle Arabic and a few other languages that are likely to give RapidMiner an issue. As a matter of fact, for straight Corpus work, GATE is darn impressive. It has a lot of plug-ins as well, but installing them isn't just plug-and-play, like with RapidMiner.
RapidMiner is supposed to be releasing version 5.2 around late January 2012 (right now), so if you decide to go that route, you will have the option of the well-supported 5.1, or the beta-version of 5.2.
Not to toot my own horn, but I did a five part video series on text analytics with RapidMiner here:
http://vancouverdata.blogspot.com/2010/11/text-analytics-with-rapidminer-loading.html
GATE is an incomprehensible mess