What opensource/free data mining engines and frameworks do you know and use for textual data?
Thank you for any advice!
Not really sure of what you're looking for. Perhaps something like Lucene?
Apache Mahout is an OpenSource Machile Learning library, that can be used with or without MapReduce (Apache Hadoop).
It provides the folloeing algorithms implementation in Java:
You can read more: http://mahout.apache.org/
http://girlincomputerscience.blogspot.com.br/2010/11/apache-mahout.html
http://www.ibm.com/developerworks/java/library/j-mahout/
RapidMiner is free and open source and runs on windows, mac, linux, and is a nice graphical workflow based program. It runs all Weka code, and integrates with R.
Weka and Rapidminer aren't that strong on clustering. They mostly do classification and similar predictions, but very little clustering. Have a look at ELKI, which is like WEKA a university project, but has tons of clustering and outlier detection methods.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With