Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sklearn: Evaluate performance of each classifier of OneVsRestClassifier inside GridSearchCV

I am dealing with multi-label classification with OneVsRestClassifier and SVC,

from sklearn.datasets import make_multilabel_classification
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV

L=3
X, y = make_multilabel_classification(n_classes=L, n_labels=2,
                                  allow_unlabeled=True,
                                  random_state=1, return_indicator=True)    
model_to_set = OneVsRestClassifier(SVC())

parameters = {
    "estimator__C": [1,2,4,8],
    "estimator__kernel": ["poly","rbf"],
    "estimator__degree":[1, 2, 3, 4],
}

model_tunning = GridSearchCV(model_to_set, param_grid=parameters,
                             scoring='f1')

model_tunning.fit(X, y)

print model_tunning.best_score_
print model_tunning.best_params_

#0.855175822314
#{'estimator__kernel': 'poly', 'estimator__C': 1, 'estimator__degree': 3}

1st question

What is the number 0.85 representing for? Is it the best score among the L classifiers or the averaged one? Similarly, does the set of parameters stand for the best-scorer among L classifiers?

2nd question

Based on the fact that, if I am right, the OneVsRestClassifier literally builds L classifiers for each label, one can expect to access or observe the performance of EACH LABEL. But how, in the above example, to obtain L scores from the GridSearchCV object?

EDIT

To simplify the problem and help myself understand more about OneVsRestClassifier, before tuning model,

model_to_set.fit(X,y)
gp = model_to_set.predict(X) # the "global" prediction
fp = model_to_set.estimators_[0].predict(X) # the first-class prediction
sp = model_to_set.estimators_[1].predict(X) # the second-class prediction
tp = model_to_set.estimators_[2].predict(X) # the third-class prediction

It can be shown that gp.T[0]==fp, gp.T[1]==sp and gp.T[2]==tp. So the "global" prediction is simply the 'sequential' L individual predictions and the 2nd question is solved.

But it is still confusing for me that if one meta-classifier OneVsRestClassifier contains L classifiers, how can GridSearchCV returns only ONE best score, corresponding to one of 4*2*4 sets of parameters, for a meta-classifier OneVsRestClassifier having L classifiers?

It would be fairly appreciated to see any comment.

like image 917
Francis Avatar asked Nov 18 '15 15:11

Francis


People also ask

How do you handle multi-label classification?

Adapted algorithm This technique uses adaptive algorithms, which are used to perform multi-label classification rather than conducting problem transformation directly. In Scikit-multilearn, we have multi-label-k-nearest-neighbor (MLkNN), which is used to handle multi-label classification.

What is onevsrest classifier?

One-vs-rest (OvR for short, also referred to as One-vs-All or OvA) is a heuristic method for using binary classification algorithms for multi-class classification. It involves splitting the multi-class dataset into multiple binary classification problems.

What is the difference between multi-label and multiclass classification?

Difference between multi-class classification & multi-label classification is that in multi-class problems the classes are mutually exclusive, whereas for multi-label problems each label represents a different classification task, but the tasks are somehow related.

What is multilabel multiclass classification?

Multi-label classification involves predicting zero or more class labels. Unlike normal classification tasks where class labels are mutually exclusive, multi-label classification requires specialized machine learning algorithms that support predicting multiple mutually non-exclusive classes or “labels.”


3 Answers

GridSearchCV creates grid from your parameter values, it evaluates your OneVsRestClassifier as atomic classifier (I.e. GridSearchCV doesn't know what is inside this metaclassifier)

First: 0.85 is the best score of OneVsRestClassifier among all possible combinations (16 combinations in your case, 4*2*4) of parameters ("estimator__C", "estimator__kernel", "estimator__degree"), it means that GridSearchCV evaluates 16 (again, it's only in this particular case) possible OneVsRestClassifier's each of which contains L SVC's. All of that L classifiers inside one OneVsRestClassifier have same values of parameters (but each of them is learning to recognize their own class from L possible)

i.e. from set of

{OneVsRestClassifier(SVC(C=1, kernel="poly", degree=1)),
 OneVsRestClassifier(SVC(C=1, kernel="poly", degree=2)),
 ...,
 OneVsRestClassifier(SVC(C=8, kernel="rbf", degree=3)),
 OneVsRestClassifier(SVC(C=8, kernel="rbf", degree=4))}

it chooses one with the best score.

model_tunning.best_params_ here represents parameters for OneVsRestClassifier(SVC()) with which it will achieve model_tunning.best_score_. You can get that best OneVsRestClassifier from model_tunning.best_estimator_ attribute.

Second: There is no ready to use code to obtain separate scores for L classifiers from OneVsRestClassifier, but you can look at implementation of OneVsRestClassifier.fit method, or take this (should work :) ):

# Here X, y - your dataset
one_vs_rest = model_tunning.best_estimator_
yT = one_vs_rest.label_binarizer_.transform(y).toarray().T
# Iterate through all L classifiers
for classifier, is_ith_class in zip(one_vs_rest.estimators_, yT):
    print(classifier.score(X, is_ith_class))
like image 200
Ibraim Ganiev Avatar answered Dec 12 '22 20:12

Ibraim Ganiev


Inspired by @Olologin 's answer, I realized that 0.85 is the best weighted average of f1 scores (in this example) obtained by L predictions. In the following code, I evaluate the model by inner test, using macro average of f1 score:

# Case A, inspect F1 score using the meta-classifier
F_A = f1_score(y, model_tunning.best_estimator_.predict(X), average='macro')

# Case B, inspect F1 scores of each label (binary task) and collect them by macro average
F_B = []
for label, clc in zip(y.T, model_tunning.best_estimator_.estimators_):
    F_B.append(f1_score(label, clf.predict(X)))
F_B = mean(F_B)

F_A==F_B # True

So it implies that the GridSearchCV applies one of 4*2*4 sets of parameters to build the meta-classifier which in turn makes prediction on each label with one of the L classifiers. The outcome will be L f1 scores for L labels, each of which is a performance of a binary task. Finally, a single score is obtained by taking average (macro or weighted average, specified by parameter in f1_score) of L f1 scores.

The GridSearchCV then choose the best averaged f1 scores among 4*2*4 sets of parameters, which is 0.85 in this example.

Though it is convenient to use the wrapper for multi-label problem, it can only maximize the averaged f1 score with a same set of parameters used to build L classifiers. If one wants to optimize the performance of each label separately, one seems to have to build L classifiers without using the wrapper.

like image 42
Francis Avatar answered Dec 12 '22 18:12

Francis


As for your second question, you might want to used GridSearchCV with scikit-multilearn's BinaryRelevance classifier. Like OneVsRestClassifier, Binary Relevance creates L single-label classifiers, one per label. For each label the training data is 1 if label is present and 0 if not present. The best selected classifier set is the BinaryRelevance class instance in best_estimator_ property of GridSearchCV. Use for predicting floats of probabilities use the predict_proba method of the BinaryRelevance object. An example can be found in the scikit-multilearn docs for model selection.

In your case I would run the following code:

from skmultilearn.problem_transform import BinaryRelevance
from sklearn.model_selection import GridSearchCV
import sklearn.metrics

model_to_set = BinaryRelevance(SVC())

parameters = {
    "classifier__estimator__C": [1,2,4,8],
    "classifier__estimator__kernel": ["poly","rbf"],
    "classifier__estimator__degree":[1, 2, 3, 4],
}

model_tunning = GridSearchCV(model_to_set, param_grid=parameters,
                             scoring='f1')

model_tunning.fit(X, y)

# for some X_test testing set
predictions = model_tunning.best_estimator_.predict(X_test)

# average=None gives per label score
metrics.f1_score(y_test, predictions, average = None) 

Please note that there much better methods for multi-label classification than Binary Relevance :) You can find them in madjarov's comparison or my recent paper.

like image 38
niedakh Avatar answered Dec 12 '22 18:12

niedakh