How to use parRF method so random forest will run faster

Question

I would like to run random forest on a large data set: 100k * 400. When I use random forest it takes a lot of time. Can I use parRF method from caret package in order to reduce running time? What is the right syntax for that? Here is an example dataframe:

dat <- read.table(text = " TargetVar  Var1    Var2       Var3
 0        0        0         7
 0        0        1         1
 0        1        0         3
 0        1        1         7
 1        0        0         5
 1        0        1         1
 1        1        0         0
 1        1        1         6
 0        0        0         8
 0        0        1         5
 1        1        1         4
 0        0        1         2
 1        0        0         9
 1        1        1         2  ", header = TRUE)

I tried:

library('caret')
m<-randomForest(TargetVar ~ Var1 + Var2 + Var3, data = dat, ntree=100, importance=TRUE, method='parRF')

But I don't see too much of a difference. Any Ideas?

Lenwood · Accepted Answer

The reason that you don't see a difference is that you aren't using the caret package. You do load it into your environment with the library() command, but then you run randomForest() which doesn't use caret.

I'll suggest starting by creating a data frame (or data.table) that contains only your input variables, and a vector containing your outcomes. I'm referring to the recently updated caret docs.

x <- data.frame(dat$Var1, dat$Var2, dat$Var3)
y <- dat$TargetVar

Next, verify that you have the parRF method available. I didn't until I updated my caret package to the most recent version (6.0-29).

library("randomForest")
library("caret")
names(getModelInfo())

You should see parRF in the output. Now you're ready to create your training model.

library(foreach)

rfParam <- expand.grid(ntree=100, importance=TRUE)

m <- train(x, y, method="parRF", tuneGrid=rfParam)

How to use parRF method so random forest will run faster

Tags:

r

parallel-processing

random-forest

r-caret

mql4beginner

1 Answers

Lenwood

Recent Activity

Donate For Us

How to use parRF method so random forest will run faster

Tags:

r

parallel-processing

random-forest

r-caret

mql4beginner

1 Answers

Lenwood

Related questions

Recent Activity

Donate For Us