Cannot run ANOVA to Compare Random Forest Models

Question

I am using tidymodels to fit multiple Random Forest models. I then followed along with this tutorial to compare the model results. The problem is that I get the error: Error in

 UseMethod("anova") : 
  no applicable method for 'anova' applied to an object of class "ranger"

As an example:

set.seed(123)
iris <- iris %>% mutate(
  is_versicolor = ifelse(Species == "versicolor", "versicolor", "not_versicolor")) %>%
  mutate(is_versicolor = factor(is_versicolor, levels = c("versicolor", "not_versicolor")))

iris_split <- initial_split(iris, strata = is_versicolor, prop = 0.8)
iris_train <- training(iris_split)
iris_test  <- testing(iris_split)

rec_normal <- recipe(is_versicolor ~ Petal.Width + Species, data = iris_train)
rec_interaction <- rec_normal %>% 
  step_interact(~ Petal.Width:starts_with("Species"))

iris_model <- rand_forest() %>% set_engine("ranger") %>% set_mode("classification")

# normal workflow
iris_wf <- workflow() %>% 
  add_model(iris_model) %>% 
  add_recipe(rec_normal)

# interaction workflow
iris_wf_interaction <- iris_wf %>% 
  update_recipe(rec_interaction)

# fit models
iris_normal_lf <- last_fit(iris_wf, split = iris_split)
iris_inter_lf <- last_fit(iris_wf_interaction, split = iris_split)

normalmodel <- iris_normal_lf %>% extract_fit_engine()
intermodel  <- iris_inter_lf %>% extract_fit_engine()

anova(normalmodel, intermodel) %>% tidy()

How can I run an ANOVA or ANOVA-type comparison of these models, to see if one is significantly better?

Isaiah · Accepted Answer

Just using your code, and adapting Julia Silge's blog on workflowsets:

Predict #TidyTuesday giant pumpkin weights with workflowsets

As ANOVA is not available for ranger, instead generate folds to resample:

set. Seed(234)
iris_folds <- vfold_cv(iris_train)
iris_folds

Combine your recipes into a workflowset:

iris_set <-
  workflow_set(
    list(rec_normal, rec_interaction),
    list(iris_model),
    cross = TRUE
  )

iris_set

Setup parallel processing:

doParallel::registerDoParallel()
set. Seed(2021)

Fit using the folds:

iris_rs <-
  workflow_map(
    iris_set,
    "fit_resamples",
    resamples = iris_folds
  )

autoplot(iris_rs)

This chart would usually address your question of how to compare models.

As "species" is on the righthand side of both recipe formulas, and the response "is_versicolor" is calculated from species, the models are completely accurate.

Finish off the output:

collect_metrics(iris_rs)

final_fit <-
  extract_workflow(iris_rs, "recipe_2_rand_forest") %>%
  fit(iris_train)

There is no tidier for ranger models.

In your code, if you change to:

rec_normal <- recipe(is_versicolor ~ Sepal.Length + Sepal.Width, data = iris_train)
rec_interaction <- recipe(is_versicolor ~ Petal.Width + Petal.Length, data = iris_train)

you can have some fun!

Hope this helps Adam. Just learning the wonderful Tidymodels like you, and look forward to comments. :-)

Cannot run ANOVA to Compare Random Forest Models

Tags:

r

machine-learning

anova

tidymodels

Adam_G

1 Answers

Isaiah

Recent Activity

Donate For Us

Cannot run ANOVA to Compare Random Forest Models

Tags:

r

machine-learning

anova

tidymodels

Adam_G

1 Answers

Isaiah

Related questions

Recent Activity

Donate For Us