Variable importance using the caret package (error); RandomForest algorithm

Tags:

I am trying to obtain the variable importance of a rf model in any way. This is the approach I have tried so far, but alternate suggestions are very welcome.

I have trained a model in R:

require(caret)
require(randomForest)
myControl = trainControl(method='cv',number=5,repeats=2,returnResamp='none')
model2 = train(increaseInAssessedLevel~., data=trainData, method = 'rf', trControl=myControl)

The dataset is fairly large, but the model runs fine. I can access its parts and run commands such as:

> model2[3]
$results
  mtry      RMSE  Rsquared      RMSESD RsquaredSD
1    2 0.1901304 0.3342449 0.004586902 0.05089500
2   61 0.1080164 0.6984240 0.006195397 0.04428158
3  120 0.1084201 0.6954841 0.007119253 0.04362755

But I get the following error:

> varImp(model2)
Error in varImp[, "%IncMSE"] : subscript out of bounds

Apparently there is supposed to be a wrapper, but that does not seem to be the case: (cf:http://www.inside-r.org/packages/cran/caret/docs/varImp)

varImp.randomForest(model2)
Error: could not find function "varImp.randomForest"

But this is particularly odd:

> traceback()
No traceback available 

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] elasticnet_1.1     lars_1.2           klaR_0.6-9         MASS_7.3-26       
 [5] kernlab_0.9-18     nnet_7.3-6         randomForest_4.6-7 doMC_1.3.0        
 [9] iterators_1.0.6    caret_5.17-7       reshape2_1.2.2     plyr_1.8          
[13] lattice_0.20-15    foreach_1.4.1      cluster_1.14.4    

loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_3.0.1  grid_3.0.1      stringr_0.6.2  
[5] tools_3.0.1

893

asked Sep 02 '13 17:09

Video Answer

2 Answers

The importance scores can take a while to compute and train won't automatically get randomForest to create them. Add importance = TRUE to the train call and it should work.

Max

150

answered Sep 21 '22 12:09

That is becouse the obtained from train() object is not a pure Random Forest model, but a list of different objects (containing the final model itself as well as cross-validation results etc). You may see them with ls(model2). So to use the final model just call varImp(model2$finalModel) .

answered Sep 20 '22 12:09

O_Devinyak

Related questions
                            
                                RCurl: HTTP Authentication When Site Responds With HTTP 401 Code Without WWW-Authenticate
                            
                                R foreach with .combine=rbindlist
                            
                                bigrams instead of single words in termdocument matrix using R and Rweka
                            
                                R error in glmnet: NA/NaN/Inf in foreign function call
                            
                                R/regex with stringi/ICU: why is a '+' considered a non-[:punct:] character?
                            
                                Split character column into several binary (0/1) columns
                            
                                Display only months in dateRangeInput or dateInput for a shiny app [R programming]
                            
                                Add sheet to Excel file
                            
                                Randomly sample groups
                            
                                Assign unique ID based on two columns [duplicate]
                            
                                Wondering how to output a chart I saw in the economist magazine
                            
                                plotting in different shapes using pch= argument
                            
                                How to pass na.rm=TRUE to sapply when calculating median?
                            
                                Error when compiling Rcpp code in an R package using RStudio
                            
                                Different legend-keys inside same legend in ggplot2
                            
                                R - cumulative sum by condition
                            
                                get long format data frame from list
                            
                                Adding percentage labels to a bar chart in ggplot2
                            
                                apply() is slow - how to make it faster or what are my alternatives?
                            
                                How to extract fitted splines from a GAM (`mgcv::gam`)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Variable importance using the caret package (error); RandomForest algorithm

Tags:

r

random-forest

r-caret

Jakub Langr

People also ask

Video Answer

2 Answers

topepo

O_Devinyak

Recent Activity

Donate For Us