super simliar to this post: ValueError: 'balanced_accuracy' is not a valid scoring value in scikit-learn
I am using:
scoring = ['precision_macro', 'recall_macro', 'balanced_accuracy_score']
clf = DecisionTreeClassifier(random_state=0)
scores = cross_validate(clf, X, y, scoring=scoring, cv=10, return_train_score=True)
And i receive the error:
ValueError: 'balanced_accuracy_score' is not a valid scoring value. Use sorted(sklearn.metrics.SCORERS.keys()) to get valid options.
I did the recommended solution and upgraded scikit (in the enviornment):
When I check the possible scorers:
sklearn.metrics.SCORERS.keys()
dict_keys(['explained_variance', 'r2', 'max_error', 'neg_median_absolute_error', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_root_mean_squared_error', 'neg_mean_poisson_deviance', 'neg_mean_gamma_deviance', 'accuracy', 'roc_auc', 'roc_auc_ovr', 'roc_auc_ovo', 'roc_auc_ovr_weighted', 'roc_auc_ovo_weighted', 'balanced_accuracy', 'average_precision', 'neg_log_loss', 'neg_brier_score', 'adjusted_rand_score', 'homogeneity_score', 'completeness_score', 'v_measure_score', 'mutual_info_score', 'adjusted_mutual_info_score', 'normalized_mutual_info_score', 'fowlkes_mallows_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'jaccard', 'jaccard_macro', 'jaccard_micro', 'jaccard_samples', 'jaccard_weighted'])
I can stil not find it? Where is the problem?
The balanced accuracy in binary and multiclass classification problems to deal with imbalanced datasets. It is defined as the average of recall obtained on each class. The best value is 1 and the worst value is 0 when adjusted=False .
The best performance is 1 with normalize == True and the number of samples with normalize == False . Compute the balanced accuracy to deal with imbalanced datasets.
Think of score as a shorthand to calculate accuracy since it is such a common metric. It is also implemented to avoid calculating accuracy like this which involves more steps: from sklearn.metrics import accuracy score preds = clf.predict(X_test) accuracy_score(y_test, preds)
According to the docs for valid scorers, the value of the scoring
parameter corresponding to the balanced_accuracy_score
scorer function is "balanced_accuracy"
as in my other answer:
Change:
scoring = ['precision_macro', 'recall_macro', 'balanced_accuracy_score']
to:
scoring = ['precision_macro', 'recall_macro', 'balanced_accuracy']
and it should work.
I do find the documentation a bit lacking in this respect, and this convention of removing the _score
suffix is not consistent either, as all the clustering metrics still have _score
in their names in their scoring
parameter values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With