R gbm logistic regression

Tags:

r

I was hoping to use the GBM package to do logistic regression, but it is giving answers slightly outside of the 0-1 range. I've tried the suggested distribution parameters for 0-1 predictions (bernoulli, and adaboost) but that actually makes things worse than using gaussian.

GBM_NTREES = 150
GBM_SHRINKAGE = 0.1
GBM_DEPTH = 4
GBM_MINOBS = 50
> GBM_model <- gbm.fit(
+ x = trainDescr 
+ ,y = trainClass 
+ ,distribution = "gaussian"
+ ,n.trees = GBM_NTREES
+ ,shrinkage = GBM_SHRINKAGE
+ ,interaction.depth = GBM_DEPTH
+ ,n.minobsinnode = GBM_MINOBS
+ ,verbose = TRUE)
Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.0603             nan     0.1000    0.0019
     2        0.0588             nan     0.1000    0.0016
     3        0.0575             nan     0.1000    0.0013
     4        0.0563             nan     0.1000    0.0011
     5        0.0553             nan     0.1000    0.0010
     6        0.0546             nan     0.1000    0.0008
     7        0.0539             nan     0.1000    0.0007
     8        0.0533             nan     0.1000    0.0006
     9        0.0528             nan     0.1000    0.0005
    10        0.0524             nan     0.1000    0.0004
   100        0.0484             nan     0.1000    0.0000
   150        0.0481             nan     0.1000   -0.0000
> prediction <- predict.gbm(object = GBM_model
+ ,newdata = testDescr
+ ,GBM_NTREES)
> hist(prediction)
> range(prediction)
[1] -0.02945224  1.00706700

Bernoulli:

GBM_model <- gbm.fit(
x = trainDescr 
,y = trainClass 
,distribution = "bernoulli"
,n.trees = GBM_NTREES
,shrinkage = GBM_SHRINKAGE
,interaction.depth = GBM_DEPTH
,n.minobsinnode = GBM_MINOBS
,verbose = TRUE)
prediction <- predict.gbm(object = GBM_model
+ ,newdata = testDescr
+ ,GBM_NTREES)
> hist(prediction)
> range(prediction)
[1] -4.699324  3.043440

And adaboost:

GBM_model <- gbm.fit(
x = trainDescr 
,y = trainClass 
,distribution = "adaboost"
,n.trees = GBM_NTREES
,shrinkage = GBM_SHRINKAGE
,interaction.depth = GBM_DEPTH
,n.minobsinnode = GBM_MINOBS
,verbose = TRUE)
> prediction <- predict.gbm(object = GBM_model
+ ,newdata = testDescr
+ ,GBM_NTREES)
> hist(prediction)
> range(prediction)
[1] -3.0374228  0.9323279

Am I doing something wrong, do I need to preProcess (scale, center) the data or do I need to go in and manually floor/cap the values with something like :

prediction <- ifelse(prediction < 0, 0, prediction)
prediction <- ifelse(prediction > 1, 1, prediction)

576

asked Dec 07 '11 05:12

screechOwl

1 Answers

From ?predict.gbm:

Returns a vector of predictions. By default the predictions are on the scale of f(x). For example, for the Bernoulli loss the returned value is on the log odds scale, poisson loss on the log scale, and coxph is on the log hazard scale.

If type="response" then gbm converts back to the same scale as the outcome. Currently the only effect this will have is returning probabilities for bernoulli and expected counts for poisson. For the other distributions "response" and "link" return the same.

So if you use distribution="bernoulli", you need to transform the predicted values to rescale them to [0, 1]: p <- plogis(predict.gbm(model)). Using distribution="gaussian" is really for regression as opposed to classification, although I'm surprised that the predictions aren't in [0, 1]: my understanding is that gbm is still based on trees, so the predicted values shouldn't be able to go outside the values present in the training data.

answered Sep 22 '22 02:09

Hong Ooi

Related questions
                            
                                time and geographical subset of netcdf raster stack or raster brick using R
                            
                                Using tidy eval for multiple dplyr filter conditions
                            
                                Show all date values on ggplot x axis - R
                            
                                How can I use Conda environments with RStudio Server?
                            
                                how to add a (multipage) pdf to rmarkdown?
                            
                                Understanding degrees of freedom in lavaan
                            
                                Find variable combinations that makes Primary Key in R
                            
                                How to use shiny javascript functions?
                            
                                data.table alternative to piping
                            
                                Extend axis limits without plotting (in order to align two plots by x-unit)
                            
                                mutate_at to replace NAs with 0
                            
                                Is it possible to change the alignment of only 1 facet title
                            
                                Transform Identity Matrix
                            
                                How do I quickly group the time column in a dataframe into intervals?
                            
                                Turning field values into column names in an R data frame
                            
                                R: Applying a function to all row-pairs of a matrix without for loop
                            
                                R -- Vignettes that are not made by Sweave possible?
                            
                                ggplot2 plot table as lines
                            
                                Aggregate rows in a large matrix by rowname
                            
                                Save Excel spreadsheet as .csv with R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With