I'm dealing with a large dataset that involves more than 100 features (which are all relevant because they have already been filtered; the original dataset had over 500 features). I created a random forest model via the train() function from the caret package and using the "ranger" method. Here's the question: how does one extract all of the variables by importance, as opposed to only the top 20 most important variables? The varImp() function yields only the top 20 variables by default. Here's some sample code (minus the training set, which is very large): <pre class="prettyprint"><code>library(caret) rforest_model <- train(target_variable ~ ., data = train_data_set, method = "ranger", importance = "impurity) </code></pre> And here's the code for extracting variable importance: <pre class="prettyprint"><code>varImp(rforest_model) </code></pre>

The varImp function extracts importance for all variables (even if they are not used by the model), it just prints out the top 20 variables. Consider this example: <pre class="prettyprint"><code>library(mlbench) #for data set library(caret) library(tidyverse) set.seed(998) data(Ionosphere) rforest_model <- train(y = Ionosphere$Class, x = Ionosphere[,1:34], method = "ranger", importance = "impurity") nrow(varImp(rforest_model)$importance) #34 variables extracted </code></pre> lets check them: <pre class="prettyprint"><code>varImp(rforest_model)$importance %>% as.data.frame() %>% rownames_to_column() %>% arrange(Overall) %>% mutate(rowname = forcats::fct_inorder(rowname )) %>% ggplot()+ geom_col(aes(x = rowname, y = Overall))+ coord_flip()+ theme_bw() </code></pre> <img src="https://i.stack.imgur.com/yYzHc.png" alt="enter image description here"> note that V2 is a zero variance feature in this data set hence it has 0 importance and is not used by the model at all.

extracting more than 20 variables by importance via varImp

Tags:

r

r-caret

I'm dealing with a large dataset that involves more than 100 features (which are all relevant because they have already been filtered; the original dataset had over 500 features). I created a random forest model via the train() function from the caret package and using the "ranger" method.

Here's the question: how does one extract all of the variables by importance, as opposed to only the top 20 most important variables? The varImp() function yields only the top 20 variables by default.

Here's some sample code (minus the training set, which is very large):

library(caret)
rforest_model <- train(target_variable ~ .,
                       data = train_data_set,
                       method = "ranger",
                       importance = "impurity)

And here's the code for extracting variable importance:

varImp(rforest_model)

584

asked Jan 02 '18 03:01

Flavio Abdenur

1 Answers

The varImp function extracts importance for all variables (even if they are not used by the model), it just prints out the top 20 variables. Consider this example:

library(mlbench) #for data set
library(caret)
library(tidyverse)

set.seed(998)
data(Ionosphere)

rforest_model <- train(y = Ionosphere$Class,
                       x = Ionosphere[,1:34],
                       method = "ranger",
                       importance = "impurity")

nrow(varImp(rforest_model)$importance) #34 variables extracted

lets check them:

varImp(rforest_model)$importance %>% 
  as.data.frame() %>%
  rownames_to_column() %>%
  arrange(Overall) %>%
  mutate(rowname = forcats::fct_inorder(rowname )) %>%
  ggplot()+
    geom_col(aes(x = rowname, y = Overall))+
    coord_flip()+
    theme_bw()

enter image description here

note that V2 is a zero variance feature in this data set hence it has 0 importance and is not used by the model at all.

answered Sep 19 '22 12:09

missuse

Related questions
                            
                                How to keep a column of dataframe as dataframe
                            
                                R - add centroids to scatter plot
                            
                                How to select rows in a table whose row.names match any element from a character vector?
                            
                                Add line break in print statement in R [duplicate]
                            
                                Extract numbers from strings including '|'
                            
                                highlight weekends using ggplot?
                            
                                Convert binary vector to decimal
                            
                                filling in columns with matching IDs from two dataframes in R
                            
                                How to look for a certain part in a string and only keep that part
                            
                                Picking individual colours from a RColorBrewer palette as a scale_colour_manual() value in ggplot2
                            
                                R: removing numbers at begin and end of a string
                            
                                R-invalid multibyte string 1
                            
                                How to plot 2 categorical variables on X-axis and two continuous variables as "fill" using ggplot2 package?
                            
                                List file information in a text file for all the files in a directory
                            
                                R -apply- convert many columns from numeric to factor
                            
                                package ‘tidyverse’ is not available
                            
                                Moving some rows of a data frame to the end based on a match vector
                            
                                Remove columns the tidyeval way
                            
                                Color points by their occurrence count in ggplot2 geom_count
                            
                                Change the order of stacked fill columns in ggplot2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With