Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing R to Matlab for Data Mining

Instead of starting to code in Matlab, I recently started learning R, mainly because it is open-source. I am currently working in data mining and machine learning field. I found many machine learning algorithms implemented in R, and I am still exploring different packages implemented in R.

I have quick question: how do you compare R to Matlab for data mining application, its popularity, pros and cons, industry and academic acceptance etc.? Which one would you choose and why?

I went through various comparisons for Matlab vs R against various metrics but I am specifically interested to get answer for its applicability in Data Mining and ML. Since both language are pretty new for me I was just wondering if R would be a good choice or not.

I appreciate any kind of suggestions.

like image 523
iinception Avatar asked Jan 27 '11 01:01

iinception


People also ask

Is MATLAB better than R?

Out of the box, MATLAB is faster than R for common technical computing tasks, statistics, and machine learning, as described in the R benchmark 2.5 (also known as Urbanek), because MATLAB library calls are optimized, and code is just-in-time compiled.

Can R be used for data mining?

R analytics is data analytics using R programming language, an open-source language used for statistical computing or graphics. This programming language is often used in statistical analysis and data mining. It can be used for analytics to identify patterns and build practical models.

Can MATLAB be used for data mining?

With MATLAB, you can access and analyze your data from a wide variety of sources, and you can scale to clusters, clouds, and big data platforms like Hadoop® and Spark®. MATLAB enables engineers and domain experts to develop their own data analytics applications.

Is MATLAB using R?

R. matlab is a package that communicates with MATLAB, can read and write MAT files, and can pass objects to (and receive objects from) MATLAB. It's a little tricky to use at first, but the ability to pass objects between R and MATLAB can be very useful.


2 Answers

For the past three years or so, i have used R daily, and the largest portion of that daily use is spent on Machine Learning/Data Mining problems.

I was an exclusive Matlab user while in University; at the time i thought it was an excellent set of tools/platform. I am sure it is today as well.

The Neural Network Toolbox, the Optimization Toolbox, Statistics Toolbox, and Curve Fitting Toolbox are each highly desirable (if not essential) for someone using MATLAB for ML/Data Mining work, yet they are all separate from the base MATLAB environment--in other words, they have to be purchased separately.

My Top 5 list for Learning ML/Data Mining in R:

  • Mining Association Rules in R

This refers to a couple things: First, a group of R Package that all begin arules (available from CRAN); you can find the complete list (arules, aruluesViz, etc.) on the Project Homepage. Second, all of these packages are based on a data-mining technique known as Market-Basked Analysis and alternatively as Association Rules. In many respects, this family of algorithms is the essence of data-mining--exhaustively traverse large transaction databases and find above-average associations or correlations among the fields (variables or features) in those databases. In practice, you connect them to a data source and let them run overnight. The central R Package in the set mentioned above is called arules; On the CRAN Package page for arules, you will find links to a couple of excellent secondary sources (vignettes in R's lexicon) on the arules package and on Association Rules technique in general.

The most current edition of this book is available in digital form for free. Likewise, at the book's website (linked to just above) are all data sets used in ESL, available for free download. (As an aside, i have the free digital version; i also purchased the hardback version from BN.com; all of the color plots in the digital version are reproduced in the hardbound version.) ESL contains thorough introductions to at least one exemplar from most of the major ML rubrics--e.g., neural metworks, SVM, KNN; unsupervised techniques (LDA, PCA, MDS, SOM, clustering), numerous flavors of regression, CART, Bayesian techniques, as well as model aggregation techniques (Boosting, Bagging) and model tuning (regularization). Finally, get the R Package that accompanies the book from CRAN (which will save the trouble of having to download the enter the datasets).

  • CRAN Task View: Machine Learning

The +3,500 Packages available for R are divided up by domain into about 30 package families or 'Task Views'. Machine Learning is one of these families. The Machine Learning Task View contains about 50 or so Packages. Some of these Packages are part of the core distribution, including e1071 (a sprawling ML package that includes working code for quite a few of the usual ML categories.)

  • Revolution Analytics Blog

With particular focus on the posts tagged with Predictive Analytics

  • ML in R tutorial comprised of slide deck and R code by Josh Reich

A thorough study of the code would, by itself, be an excellent introduction to ML in R.

And one final resource that i think is excellent, but didn't make in the top 5:

  • A Guide to Getting Stared in Machine Learning [in R]

posted at the blog A Beautiful WWW

like image 170
doug Avatar answered Oct 03 '22 17:10

doug


Please look at the CRAN Task Views and in particular at the CRAN Task View on Machine Learning and Statistical Learning which summarises this nicely.

like image 33
Dirk Eddelbuettel Avatar answered Oct 03 '22 16:10

Dirk Eddelbuettel