I want to train a lgb model with custom metric : f1_score
with weighted
average.
I went through the advanced examples of lightgbm over here and found the implimentation of custom binary error function. I implemented as similiar functon to return f1_score as shown below.
def f1_metric(preds, train_data):
labels = train_data.get_label()
return 'f1', f1_score(labels, preds, average='weighted'), True
I tried to train the model by passing feval
parameter as f1_metric
as shown below.
evals_results = {}
bst = lgb.train(params,
dtrain,
valid_sets= [dvalid],
valid_names=['valid'],
evals_result=evals_results,
num_boost_round=num_boost_round,
early_stopping_rounds=early_stopping_rounds,
verbose_eval=25,
feval=f1_metric)
Then I am getting ValueError: Found input variables with inconsistent numbers of samples:
The training set is being passed to the function rather than the validation set.
How can I configure such that the validation set is passed and f1_score is returned.?
This ends up being 50%. The F1 score is the metric that we are really interested in. The goal of the example was to show its added value for modeling with imbalanced data. The resulting F1 score of the first model was 0: we can be happy with this score, as it was a very bad model. The F1 score of the second model was 0.4.
F1 score. Computing the accuracy on the better model. Interestingly, the accuracy of the logistic regression is 95%: exactly the same as our very bad baseline model! F1 score. Computing precision and recall on the better model. We end up with the following metrics for Precision and Recall: F1 score. Computing the F1 score on the better model.
All you need to know about the F1 score in machine learning. With an example applying the F1 score in Python. F1 Score. Photo by Jonathan Chng on Unsplash. In this article, you will discover the F1 score. The F1 score is a machine learning metric that can be used in classification models.
The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal.
The docs are a bit confusing. When describing the signature of the function that you pass to feval, they call its parameters preds and train_data, which is a bit misleading.
But the following seems to work:
from sklearn.metrics import f1_score
def lgb_f1_score(y_hat, data):
y_true = data.get_label()
y_hat = np.round(y_hat) # scikits f1 doesn't like probabilities
return 'f1', f1_score(y_true, y_hat), True
evals_result = {}
clf = lgb.train(param, train_data, valid_sets=[val_data, train_data], valid_names=['val', 'train'], feval=lgb_f1_score, evals_result=evals_result)
lgb.plot_metric(evals_result, metric='f1')
To use more than one custom metric, define one overall custom metrics function just like above, in which you calculate all metrics and return a list of tuples.
Edit: Fixed code, of course with F1 bigger is better should be set to True.
Regarding Toby's answers:
def lgb_f1_score(y_hat, data):
y_true = data.get_label()
y_hat = np.round(y_hat) # scikits f1 doesn't like probabilities
return 'f1', f1_score(y_true, y_hat), True
I suggest change the y_hat part to this:
y_hat = np.where(y_hat < 0.5, 0, 1)
Reason: I used the y_hat = np.round(y_hat) and fonud out that during training the lightgbm model will sometimes(very unlikely but still a change) regard our y prediction to multiclass instead of binary.
My speculation: Sometimes the y prediction will be small or higher enough to be round to negative value or 2?I'm not sure,but when i changed the code using np.where, the bug is gone.
Cost me a morning to figure this bug,although I'm not really sure if the np.where solution is good.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With