Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implement null distribution for gbm interaction strength

I am trying to determine which interactions in a gbm model are significant using the method described in Friedman and Popescu 2008 https://projecteuclid.org/euclid.aoas/1223908046. My gbm is a classification model with 9 different classes. I'm struggling with how to translate Section 8.3 into code to run in R.

I think the overall process is to:

  1. Train a version of the model with max.depth = 1
  2. Simulate response data from this model
  3. Train a new model on this data with max.depth the same as the real model
  4. Get interaction strength for this model
  5. Repeat steps 1-4 to create a null distribution of interaction strengths

The part that I am finding most confusing is implementing equations 48 and 49. (You will have to look at the linked article since I can't reproduce them here)

This is what I think I understand but please correct me if I'm wrong:

y_i is a new vector of the response that we will use to train a new model which will provide the null distribution of interaction statistics.

F_A(x_i) is the prediction from a version of the gbm model trained with max.depth = 1

b_i is a probability between 0 and 1 based on the prediction from the additive model F_A(x_i)

Questions

  1. What is subscript i? Is it the number of iterations in the bootstrap?
  2. How is each artificial data set different from the others?
  3. Are we subbing the Pr(b_i = 1) into equation 48?
  4. How can this be done with multinomial classification?
  5. How would one implement this in R? Preferably using the gbm package.

Any ideas or references are welcome!

like image 420
see24 Avatar asked Jan 31 '19 16:01

see24


Video Answer


1 Answers

Overall, the process is an elegant way to neutralise the interaction effects in y by permuting/re-distributing the extra contribution of modelling on the interactions. The extra contribution could be captured by the margins between a full and an additive model.

  1. What is subscript i? Is it the number of iterations in the bootstrap?

It is the index of samples. There are N samples in each iteration.

  1. How is each artificial data set different from the others?

The predictors X are the same across data sets. The response values Y~ are different due to random permutation of margins in equation 47 and random realisation (for categorical outcomes only) in equation 48.

  1. Are we subbing the Pr(b_i = 1) into equation 48?

Yes, if the outcome Y is binary.

  1. How can this be done with multinomial classification?

One way is to randomly permute margins in the log-odds of each category. Followed by random realisation according to the probability from the additive model.

  1. How would one implement this in R? Preferably using the gbm package.

I tried to implement it in-line with your overall process.

Firstly, a simulated training data set {X1,X2,Y} of size N=200 where Y has three categories (Y1,Y2,Y3) realised by the probabilities determined by X1, X2. The interaction part X1*X2 is in Y1, while the additive parts are in Y2,Y3.

set.seed(1)
N <- 200
X1 <- rnorm(N) # 2 predictors
X2 <- rnorm(N)

#log-odds for 3 categories
Y1 <- 2*X1*X2 + rnorm(N, sd=1/10) # interaction
Y2 <- X1^2 + rnorm(N, sd=1/10) #additive
Y3 <- X2^2 + rnorm(N, sd=1/10) #additive
Y <- rep(NA, N) # Multinomial outcome with 3 categories
for (i in 1:N)
  {
    prob <- 1 / (1 + exp(-c(Y1[i],Y2[i],Y3[i]))) #logistic regression
    Y[i] <- which.max(rmultinom(1, 10000, prob=prob)) #realisation from prob
  }
Y <- factor(Y)
levels(Y) <- c('Y1','Y2','Y3')
table(Y)
#Y1 Y2 Y3 
#38 75 87
dat = data.frame(Y, X1, X2)
head(dat)
# Y         X1         X2
# 2 -0.6264538  0.4094018
# 3  0.1836433  1.6888733
# 3 -0.8356286  1.5865884
# 2  1.5952808 -0.3309078
# 3  0.3295078 -2.2852355
# 3 -0.8204684  2.4976616
  1. Train a full and an additive models with max.depth = 2 and 1 respectively.
library(gbm)
n.trees <- 100
F_full <- gbm(Y ~ ., data=dat, distribution='multinomial', n.trees=n.trees, cv.folds=3,
            interaction.depth=2) # consider interactions
F_additive <- gbm(Y ~ ., data=dat, distribution='multinomial', n.trees=n.trees, cv.folds=3,
            interaction.depth=1) # ignore interactions

#use improved prediction as interaction strength
interaction_strength_original <- min(F_additive$cv.error) - min(F_full$cv.error)
> 0.1937891
  1. Simulate response data from this model.
#randomly permute margins (residuals) of log-odds to remove any interaction effects
margin <- predict(F_full, n.trees=gbm.perf(F_full, plot.it=FALSE), type='link')[,,1] -
         predict(F_additive, n.trees=gbm.perf(F_additive, plot.it=FALSE), type='link')[,,1]
margin <- apply(margin, 2, sample) #independent permutation for each category (Y1, Y2, Y3)

Y_art <- rep(NA, N) #response values of an artificial dataset
for (i in 1:N)
  {
    prob <- predict(F_additive, n.trees=gbm.perf(F_additive, plot.it=FALSE), type='link',
                  newdata=dat[i,])
    prob <- prob + margin[i,] # equation (47)
    prob <- 1 / (1 + exp(-prob))
    Y_art[i] <- which.max(rmultinom(1, 1000, prob=prob)) #Similar to random realisation in equation (49)
  }
Y_art <- factor(Y_art)
levels(Y_art) = c('Y1','Y2','Y3')

table(Y_art)
#Y1 Y2 Y3 
#21 88 91
  1. Train a new model on this artificial data with max.depth (2) the same as the real model
F_full_art = gbm(Y_art ~ ., distribution='multinomial', n.trees=n.trees, cv.folds=3,
            data=data.frame(Y_art, X1, X2),
            interaction.depth=2)
F_additive_art = gbm(Y_art ~ ., distribution='multinomial', n.trees=n.trees, cv.folds=3,
            data=data.frame(Y_art, X1, X2),
            interaction.depth=1)
  1. Get interaction strength for this model
interaction_strength_art = min(F_additive_art$cv.error) - min(F_full_art$cv.error)
> 0.01323959 # much smaller than interaction_strength_original in step 1.
  1. Repeat steps 2-4 to create a null distribution of interaction strengths. As expected, the interaction effects are much lower (-0.0527 to 0.0421) in the neutralised data sets than in the original training data set (0.1938).
interaction_strength_art <- NULL
for (j in 1:10)
  { 
    print(j)
    interaction_strength_art <- c(interaction_strength_art, step_2_to_4())
  }

summary(interaction_strength_art)
#     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#-0.052648 -0.019415  0.001124 -0.004310  0.012759  0.042058

interaction_strength_original
> 0.1937891
like image 144
Ryan SY Kwan Avatar answered Oct 05 '22 23:10

Ryan SY Kwan