Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partial Plot can't be produced from randomForest in R

The model is constructed and all the variable importance plots can be produced. However, when I used partialPlot on continuous variable, for example purchase_value: partialPlot(rf, fraud_data_train, purchase_value, which.class = 1) the error is

Error in is.finite(x): default method not implemented for type 'list'

For the categorical variable(browser) the error is partialPlot(rf, fraud_data_train, browser, which.class = 1)

Error in FUN(X[[i]], ...) : 
  only defined on a data frame with all numeric variables

The data is available here and the code is as below:

rf = randomForest(y = fraud_data_train$class_factor, 
                  x = fraud_data_train[,-predictors_notinclude],
                  ntree = 30, mtry = 4, keep.forest = TRUE,
                  importance = TRUE, proximity = TRUE)
partialPlot(rf, fraud_data_train, purchase_value, which.class =1)

Update:

Here is the screenshot from my R studio console: enter image description here

Update 2

Somehow the plot showed up in the notebook markdown..but still confused why it can't be output in the console enter image description here

like image 846
MYjx Avatar asked Aug 05 '16 23:08

MYjx


1 Answers

If your data is not in the form of a dataframe this will cause partialplot() to throw that error. Looking at the documentation for partialplot() it states that the data must be in the form of a dataframe:

pred.data
a data frame used for constructing the plot, usually the training data used to construct the random forest..

You can fix this by coercing the data to be a dataframe for example by using as.data.frame()

Here is an example using your data: partialPlot(rf, as.data.frame(fraud_data_train), purchase_value, which.class = 1).

like image 156
Huashuai Avatar answered Nov 25 '22 08:11

Huashuai