Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom Objective Function to optimize Fscore - XGBOOST

Tags:

I am trying to implement xgboost on a classification data with imbalanced classes (1% of ones and 99% zeroes).

I am using binary:logistic as the objective function for classification.

According to my knowledge on xgboost - As the boosting starts building trees, the objective function is optimized iteratively achieving best performance at the end when all the trees are combined.

In my data due to imbalance in the classes, I am facing the problem of Accuracy Paradox. Where at the end of the model I am able to achieve great accuracy but poor precision and recall

I wanted a custom objective function that can optimize the model and returns a final xgboost model with best f-score. Or can I use any other objective functions that can return in best f-score ?

Where F-Score = (2 * Precision * Recall)/(Precision + Recall)

like image 723
Kartheek Palepu Avatar asked Apr 27 '17 06:04

Kartheek Palepu


1 Answers

I'm no expert in the matter, but I think this evaluation metric should do the job:

f1score_eval <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")

  e_TP <- sum( (labels==1) & (preds >= 0.5) )
  e_FP <- sum( (labels==0) & (preds >= 0.5) )
  e_FN <- sum( (labels==1) & (preds < 0.5) )
  e_TN <- sum( (labels==0) & (preds < 0.5) )

  e_precision <- e_TP / (e_TP+e_FP)
  e_recall <- e_TP / (e_TP+e_FN)

  e_f1 <- 2*(e_precision*e_recall)/(e_precision+e_recall)

  return(list(metric = "f1-score", value = e_f1))
}

References:

https://github.com/dmlc/xgboost/issues/1152

http://xgboost.readthedocs.io/en/latest/parameter.html

like image 60
David Hernández Merino Avatar answered Sep 23 '22 10:09

David Hernández Merino