The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative. The best value is 1 and the worst value is 0.
The F1 score can be interpreted as a harmonic mean of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal.
F1-score is harmonic mean of precision and recall score and is used as a metrics in the scenarios where choosing either of precision or recall score can result in compromise in terms of model giving high false positives and false negatives respectively.
Those functions can take zero_division parameter. zero_division (Sets the value to return when there is a zero division): "warn", 0 or 1. default="warn" . If set to "warn", this acts as 0, but warnings are also raised. All reactions.
Im working on a binary classification model, classifier is naive bayes. I have an almost balanced dataset however I get the following error message when I predict:
UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.
'precision', 'predicted', average, warn_for)
I'm using gridsearch with CV k-fold 10. The test set and predictions contain both classes, so I don't understand the message. I'm working on the same dataset, train/test split, cv and random seed for 6 other models and those work perfect. Data is ingested externally into a dataframe, randomize and seed is fixed. Then the naive bayes classification model class the file at the beginning of before this code snippet.
X_train, X_test, y_train, y_test, len_train, len_test = \
train_test_split(data['X'], data['y'], data['len'], test_size=0.4)
pipeline = Pipeline([
('classifier', MultinomialNB())
])
cv=StratifiedKFold(len_train, n_folds=10)
len_train = len_train.reshape(-1,1)
len_test = len_test.reshape(-1,1)
params = [
{'classifier__alpha': [0, 0.0001, 0.001, 0.01]}
]
grid = GridSearchCV(
pipeline,
param_grid=params,
refit=True,
n_jobs=-1,
scoring='accuracy',
cv=cv,
)
nb_fit = grid.fit(len_train, y_train)
preds = nb_fit.predict(len_test)
print(confusion_matrix(y_test, preds, labels=['1','0']))
print(classification_report(y_test, preds))
I was 'forced' by python to alter the shape of the series, maybe that is the culprit?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With