Suggestions for speeding up Random Forests

Tags:

I'm doing some work with the randomForest package and while it works well, it can be time-consuming. Any one have any suggestions for speeding things up? I'm using a Windows 7 box w/ a dual core AMD chip. I know about R not being multi- thread/processor, but was curious if any of the parallel packages (rmpi, snow, snowfall, etc.) worked for randomForest stuff. Thanks.

EDIT:

I'm using rF for some classification work (0's and 1's). The data has about 8-12 variable columns and the training set is a sample of 10k lines, so it's decent size but not crazy. I'm running 500 trees and an mtry of 2, 3, or 4.

EDIT 2: Here's some output:

> head(t22)   Id Fail     CCUse Age S-TFail         DR MonInc #OpenLines L-TFail RE M-TFail Dep 1  1    1 0.7661266  45       2 0.80298213   9120         13       0  6       0   2 2  2    0 0.9571510  40       0 0.12187620   2600          4       0  0       0   1 3  3    0 0.6581801  38       1 0.08511338   3042          2       1  0       0   0 4  4    0 0.2338098  30       0 0.03604968   3300          5       0  0       0   0 5  5    0 0.9072394  49       1 0.02492570  63588          7       0  1       0   0 6  6    0 0.2131787  74       0 0.37560697   3500          3       0  1       0   1 > ptm <- proc.time() >  > RF<- randomForest(t22[,-c(1,2,7,12)],t22$Fail +                    ,sampsize=c(10000),do.trace=F,importance=TRUE,ntree=500,,forest=TRUE) Warning message: In randomForest.default(t22[, -c(1, 2, 7, 12)], t22$Fail, sampsize = c(10000),  :   The response has five or fewer unique values.  Are you sure you want to do regression? > proc.time() - ptm    user  system elapsed   437.30    0.86  450.97  >

848

asked Oct 20 '11 01:10

screechOwl

2 Answers

The manual of the foreach package has a section on Parallel Random Forests (Using The foreach Package, Section 5.1):

> library("foreach") > library("doSNOW") > registerDoSNOW(makeCluster(4, type="SOCK"))  > x <- matrix(runif(500), 100) > y <- gl(2, 50)  > rf <- foreach(ntree = rep(250, 4), .combine = combine, .packages = "randomForest") %dopar% +    randomForest(x, y, ntree = ntree) > rf Call: randomForest(x = x, y = y, ntree = ntree) Type of random forest: classification Number of trees: 1000

If we want want to create a random forest model with a 1000 trees, and our computer has four cores, we can split up the problem into four pieces by executing the randomForest function four times, with the ntree argument set to 250. Of course, we have to combine the resulting randomForest objects, but the randomForest package comes with a function called combine.

127

answered Sep 30 '22 18:09

rcs

There are two 'out of the box' options that address this problem. First, the caret package contains a method 'parRF' that handles this elegantly. I commonly use this with 16 cores to great effect. The randomShrubbery package also takes advantages of multiple cores for RF on Revolution R.

answered Sep 30 '22 17:09

Brent

Related questions
                            
                                Check to see if an array is already sorted?
                            
                                How to Change the UISlider to Vertical?
                            
                                JavaScript assoc array with negative int keys
                            
                                Remove mean from numpy matrix
                            
                                Java Delay/Wait
                            
                                Wrapping calls to method on a class with a standard try/catch
                            
                                Copying raw file into SDCard?
                            
                                Persisting SCSS variables in rails asset pipeline?
                            
                                Is it possible to override the required attribute on a property in a model?
                            
                                Why was the constructor interface of std::vector changed with C++11?
                            
                                How do I change Eclipse to use spaces instead of tabs in Javascript editor?
                            
                                Render Highcharts canvas as a PNG on the page

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With