R: How do boosted regression trees deal with missing data? [closed]

Tags:

How does the R implementation of boosted regression trees (package gbm) by default deal with missing values of the predictor variables? Are they imputed and if they are, according to which algorithm?

Background of my question: I did the analysis almost a year ago and I used the scripts provided by Elith et al. 2008 (A working guide to boosted regression trees, Journal of Animal Ecology 77, 802–813) to invoke gbm. I now got aware that I had NAs for some of the predictive variables and I wonder how the boosted regression trees dealt with them. Browsing through various manuals and papers I found statements like "boosted regression trees can accomodate missing values" and the like, but I couldn't find a precise description of what gbm is doing with missing values. The analysis itself ran without problems, so gbm must have dealt with them in one or the other way. In the gbm manual there is even an example where deliberately NAs are introduced to demonstrate that gbm keeps working without problems. Now I'd like to know what gbm precisely does with NAs (skip them, impute them,...?).

792

asked Sep 06 '13 12:09

user7417

1 Answers

The gbm function can be used for imputation as described in Jeffrey Wongs blog:. Missing values get surrogate splits and the user can then get predictions for iems with incompleted predictor sets.

He has developed a package based on this approach. The GitHub repo has this in the header to one of the files for gbm:

#' GBM Imputation
#'
#' Imputation using Boosted Trees
#' Fill each column by treating it as a regression problem. For each
#' column i, use boosted regression trees to predict i using all other
#' columns except i. If the predictor variables also contain missing data,
#' the gbm function will itself use surrogate variables as substitutes for the predictors.
#' This imputation function can handle both categorical and numeric data.

To find this I merely typed this into a Google search: how does gbm deal with missing values. It was the 2nd hit for me.

https://github.com/jeffwong/imputation/blob/master/R/gbmImpute.R

168

answered Sep 23 '22 23:09

IRTFM

Related questions
                            
                                Solving quadratic programming using R
                            
                                Empty plot in R
                            
                                Flatten the alpha channel in ggplot2
                            
                                Control color of legend elements that are not colour guides in ggplot
                            
                                Changing one word in character string to bold face using textplot()
                            
                                How to manage a database connection in an R Package
                            
                                R ff package ffsave 'zip' not found
                            
                                If/else suggests package description
                            
                                Sample size and power calculation in r as viable alternative to proc power in SAS?
                            
                                knitr: calling a ggplot2 function in a loop doesn't plot when accompanied by certain other plotting functions
                            
                                How to get RGB raster image with UTM coordinates
                            
                                Brew and knit one PDF report split by variable with special characters (å æ ø) - encoding issue
                            
                                Can't increase title and x/y label size in a ggplot2 plot saved as a PNG file, but it works fine on screen
                            
                                Read system TMP dir in R
                            
                                Data.table objects turn into data.frame after calling fix()
                            
                                Using ggmap map of the world
                            
                                Drawing a triangle with geom_polygon
                            
                                R markdown - including an svg image with tooltips generated using RSVGTipsDevice package
                            
                                Fast sampling from Truncated Normal Distribution using Rcpp and openMP
                            
                                R+Hadoop: How to read CSV file from HDFS and execute mapreduce?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R: How do boosted regression trees deal with missing data? [closed]

Tags:

r

tree

regression

user7417

People also ask

1 Answers

IRTFM

Recent Activity

Donate For Us