Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the value of 'leaf' in the following xgboost model tree diagram means?

enter image description here

I am guessing that it is conditional probability given that the above (tree branch) condition exists. However, I am not clear on it.

If you want to read more about the data used or how do we get this diagram then go to : http://machinelearningmastery.com/visualize-gradient-boosting-decision-trees-xgboost-python/

like image 889
dsl1990 Avatar asked Dec 02 '16 06:12

dsl1990


Video Answer


3 Answers

For a classification tree with 2 classes {0,1}, the value of the leaf node represent the raw score for class 1. It can be converted to a probability score by using the logistic function. The calculation below use the left most leaf as an example.

1/(1+np.exp(-1*0.167528))=0.5417843204057448

What this means is if a data point ends up being distributed to this leaf, the probability of this data point being class 1 is 0.5417843204057448.

like image 120
Allen Avatar answered Oct 20 '22 07:10

Allen


Attribute leaf is the predicted value. In other words, if the evaluation of a tree model ends at that terminal node (aka leaf node), then this is the value that is returned.

In pseudocode (the left-most branch of your tree model):

if(f1 < 127.5){
  if(f7 < 28.5){
    if(f5 < 45.4){
      return 0.167528f;
    } else {
      return 0.05f;
    }
  }
}
like image 38
user1808924 Avatar answered Oct 20 '22 05:10

user1808924


If it is a regression model (objective can be reg:squarederror), then the leaf value is the prediction of that tree for the given data point. The leaf value can be negative based on your target variable. The final prediction for that data point will be sum of leaf values in all the trees for that point.

If it is a classification model (objective can be binary:logistic), then the leaf value is representative (like raw score) for the probability of the data point belonging to the positive class. The final probability prediction is obtained by taking sum of leaf values (raw scores) in all the trees and then transforming it between 0 and 1 using a sigmoid function. The leaf value (raw score) can be negative, the value 0 actually represents probability being 1/2.

Please find more details about the parameters and outputs at - https://xgboost.readthedocs.io/en/latest/parameter.html

like image 8
sameershah141 Avatar answered Oct 20 '22 06:10

sameershah141