Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data Mining/Statistical Analysis options for a Heroku Rails app?

I have a rails app that is hosted on Heroku for which I want to incorporate some live data analysis. Ideally, I'd love to figure out a way to run a generalized boosted regression model, which I know is available in both R (http://cran.r-project.org/web/packages/gbm/index.html) and Stata (http://www.stata-journal.com/article.html?article=st0087). I want to save the resulting gbm tree and then, within my app, use it to predict new results based on user input.

If that's not possible, I'd be open to using other data mining algorithms. Most important to me is the ability to integrate it into my Heroku app so that it can run without my local machine.

Options I've looked into:

1) Heroku Support suggested vendoring the R library into a ruby gem. I'm relatively new to ruby and rails, is this something that would be feasible for me to do. I've looked around for instructions on vendoring libraries in gems, but haven't been able to find much.

2) Another thread here (http://stackoverflow.com/questions/6495232/statistic-engine-that-work-with-heroku) mentioned CloudNumbers, but it doesn't seem possible to call the service from a Rails app.

3) In one of their case studies, Heroku mentions FlightCaster, which uses Clojure, Hadoop, and EC2 for their machine learning (http://www.infoq.com/articles/flightcaster-clojure-rails). I saw that Heroku supports Clojure, but is there a way to integrate it (or more specifically Incanter) into my Rails app?

Please let me know if you have any ideas.

like image 354
middkidd Avatar asked Sep 25 '11 16:09

middkidd


1 Answers

I'll answer this from an R perspective. Generally, you're going to face two problems:

1) Interfacing with R, regardless of where it's running

2) Doing this from Heroku, where there are a special set of challenges.

There are a few general approaches to the first of these -- you can use a binding to R (rsruby, rinruby, etc.), you can shell out to R (e.g., from ruby R -e "RCODEHERE"), you can access R as a webservice (see the Rook package, and specifically something like https://github.com/jeffreyhorner/rRack/blob/master/Rook/inst/exampleApps/RJSONIO.R), or you can manually access R using something like rserve.

Of these, shelling out to R is the easiest thing to do if you're just doing a single operation and aren't hugely concerned about performance. You'll need to parse the output that comes back, but that's the fastest way in my experience for a single operation.

For more significant usage, I'd suggest using either one of the bindings, or setting up R as a webservice on another Heroku app and calling to it via HTTP.

The next challenge is getting R running on Heroku -- it's not available as part of the standard environment, and it's a read-only file system with no root access, so you can't just do sudo apt-get install.

It is possible to vendor R into a gem -- someone has started doing this at https://github.com/deet-uc/rsruby-heroku, but I was personally unable to get it working. It's also possible to build R directly on Heroku by installing all of the dependencies, etc. -- this is the approach that I've taken at https://github.com/noahhl/rookonheroku (step 1 is all you need if you aren't using Rook).

Note that Heroku might not allow you to spin up a second process in the same thread as your Rails app, which is what most of the bindings do. This can make it rather difficult to get those bindings working, which is why I tend to favor either shelling out to R, or exposing it as a webservice and accessing it via HTTP.

like image 139
Noah Avatar answered Oct 21 '22 16:10

Noah