Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which datamining tool to use? [closed]

Can somebody explain me the main pros and cons of the most known datamining open-source tools?

Everywhere I read that RapidMiner, Weka, Orange, KNIME are the best ones. look at this blog post

Can somebody do a fast technical comparison in a small bullet list.

My needs are the following:

  • It should support classification algorithms (Naive Bayes, SVM, C4.5, kNN).
  • It should be easy to implement in Java.
  • It should have understandable documentation.
  • It should have reference production projects or use cases working on in.
  • some additional benchmark comparison if possible.

Thanks!

like image 495
user2670818 Avatar asked Jul 25 '16 09:07

user2670818


People also ask

Which tool is used for data mining?

1. Rapid Miner. Rapid Miner is a data science software platform that provides an integrated environment for data preparation, machine learning, deep learning, text mining and predictive analysis. It is one of the apex leading open source system for data mining.

Which one is the open source application for data mining?

DataMelt or DMelt is open-source software for numeric computation, mathematics, statistics, symbolic calculations, data analysis and data visualisation. The platform is a combination of various scripting languages such as Python, Ruby, Groovy, among others with several Java packages.


3 Answers

I would like to say firstly there are pro's and cons for each of them on your list however I would suggest out of your list weka from my personal experience it is incredibly simple to implement in your own java application using the weka jar file and has its own self contained tools for data mining.

Rapid miner seems to be a commercial solution offering an end to end solution however the most notable number of examples of external implementations of solutions for rapid miner are usually in python and r script not java.

Orange offers tools that seem to be targeted primarily at people with possibly less need for custom implementations into their own software but a far easier time with user itneraction, its written in python and source is available, user addons are supported.

Knime is another commercial platform offering end to end solutions for data mining and analysis providing all the tools required, this one has various good reviews around the internet but i havent used it enough to advise you or anyone on the pro's or cons of it.

See here for knime vs weka

Best data mining tools

As i said weka is my personal favorite as a software developer but im sure other people have varying reasons and opinions on why to choose one over the other. Hope you find the right solution for you.

Also per your requirements weka supports the following:

Naivebayes

SVM

C4.5

KNN

like image 196
D3181 Avatar answered Sep 30 '22 14:09

D3181


I have tried Orange and Weka with a 15K records database and found problems with the memory management in Weka, it needed more than 16Gb of RAM while Orange could've managed the database without using that much. Once Weka reaches the maximum amount of memory, it crashes, even if you set more memory in the ini file telling Java virtual machine to use more.

like image 20
Antonio Velazquez Bustamante Avatar answered Sep 30 '22 14:09

Antonio Velazquez Bustamante


I recently evaluated many open source projects, comparing and contrasted them with regards to the decision tree machine learning algorithm. Weka and KNIME were included in that evaluation. I covered the differences in algorithm, UX, accuracy, and model inspection. You might chose one or the other depending on what features you value most.

like image 32
Glenn Avatar answered Sep 30 '22 13:09

Glenn