I am dealing with multi-label classification with <code>OneVsRestClassifier</code> and <code>SVC</code>, <pre class="prettyprint"><code>from sklearn.datasets import make_multilabel_classification from sklearn.multiclass import OneVsRestClassifier from sklearn.svm import SVC from sklearn.grid_search import GridSearchCV L=3 X, y = make_multilabel_classification(n_classes=L, n_labels=2, allow_unlabeled=True, random_state=1, return_indicator=True) model_to_set = OneVsRestClassifier(SVC()) parameters = { "estimator__C": [1,2,4,8], "estimator__kernel": ["poly","rbf"], "estimator__degree":[1, 2, 3, 4], } model_tunning = GridSearchCV(model_to_set, param_grid=parameters, scoring='f1') model_tunning.fit(X, y) print model_tunning.best_score_ print model_tunning.best_params_ #0.855175822314 #{'estimator__kernel': 'poly', 'estimator__C': 1, 'estimator__degree': 3} </code></pre> 1st question What is the number <code>0.85</code> representing for? Is it the best score among the <code>L</code> classifiers or the averaged one? Similarly, does the set of parameters stand for the best-scorer among <code>L</code> classifiers? 2nd question Based on the fact that, if I am right, the <code>OneVsRestClassifier</code> literally builds <code>L</code> classifiers for each label, one can expect to access or observe the performance of EACH LABEL. But how, in the above example, to obtain <code>L</code> scores from the <code>GridSearchCV</code> object? EDIT To simplify the problem and help myself understand more about <code>OneVsRestClassifier</code>, before tuning model, <pre class="prettyprint"><code>model_to_set.fit(X,y) gp = model_to_set.predict(X) # the "global" prediction fp = model_to_set.estimators_[0].predict(X) # the first-class prediction sp = model_to_set.estimators_[1].predict(X) # the second-class prediction tp = model_to_set.estimators_[2].predict(X) # the third-class prediction </code></pre> It can be shown that <code>gp.T[0]==fp</code>, <code>gp.T[1]==sp</code> and <code>gp.T[2]==tp</code>. So the "global" prediction is simply the 'sequential' <code>L</code> individual predictions and the 2nd question is solved. But it is still confusing for me that if one meta-classifier <code>OneVsRestClassifier</code> contains <code>L</code> classifiers, how can <code>GridSearchCV</code> returns only ONE best score, corresponding to one of 4*2*4 sets of parameters, for a meta-classifier <code>OneVsRestClassifier</code> having <code>L</code> classifiers? It would be fairly appreciated to see any comment.

Inspired by @Olologin 's answer, I realized that 0.85 is the best weighted average of f1 scores (in this example) obtained by <code>L</code> predictions. In the following code, I evaluate the model by inner test, using macro average of f1 score: <pre class="prettyprint"><code># Case A, inspect F1 score using the meta-classifier F_A = f1_score(y, model_tunning.best_estimator_.predict(X), average='macro') # Case B, inspect F1 scores of each label (binary task) and collect them by macro average F_B = [] for label, clc in zip(y.T, model_tunning.best_estimator_.estimators_): F_B.append(f1_score(label, clf.predict(X))) F_B = mean(F_B) F_A==F_B # True </code></pre> So it implies that the <code>GridSearchCV</code> applies one of 4*2*4 sets of parameters to build the meta-classifier which in turn makes prediction on each label with one of the <code>L</code> classifiers. The outcome will be <code>L</code> f1 scores for <code>L</code> labels, each of which is a performance of a binary task. Finally, a single score is obtained by taking average (macro or weighted average, specified by parameter in f1_score) of <code>L</code> f1 scores. The <code>GridSearchCV</code> then choose the best averaged f1 scores among 4*2*4 sets of parameters, which is 0.85 in this example. Though it is convenient to use the wrapper for multi-label problem, it can only maximize the averaged f1 score with a same set of parameters used to build <code>L</code> classifiers. If one wants to optimize the performance of each label separately, one seems to have to build <code>L</code> classifiers without using the wrapper.

As for your second question, you might want to used <code>GridSearchCV</code> with scikit-multilearn's BinaryRelevance classifier. Like <code>OneVsRestClassifier</code>, Binary Relevance creates L single-label classifiers, one per label. For each label the training data is 1 if label is present and 0 if not present. The best selected classifier set is the <code>BinaryRelevance</code> class instance in <code>best_estimator_</code> property of <code>GridSearchCV</code>. Use for predicting floats of probabilities use the <code>predict_proba</code> method of the <code>BinaryRelevance</code> object. An example can be found in the scikit-multilearn docs for model selection. In your case I would run the following code: <pre class="prettyprint"><code>from skmultilearn.problem_transform import BinaryRelevance from sklearn.model_selection import GridSearchCV import sklearn.metrics model_to_set = BinaryRelevance(SVC()) parameters = { "classifier__estimator__C": [1,2,4,8], "classifier__estimator__kernel": ["poly","rbf"], "classifier__estimator__degree":[1, 2, 3, 4], } model_tunning = GridSearchCV(model_to_set, param_grid=parameters, scoring='f1') model_tunning.fit(X, y) # for some X_test testing set predictions = model_tunning.best_estimator_.predict(X_test) # average=None gives per label score metrics.f1_score(y_test, predictions, average = None) </code></pre> Please note that there much better methods for multi-label classification than Binary Relevance :) You can find them in madjarov's comparison or my recent paper.

Sklearn: Evaluate performance of each classifier of OneVsRestClassifier inside GridSearchCV

Tags:

python

scikit-learn

multilabel-classification

grid-search

I am dealing with multi-label classification with OneVsRestClassifier and SVC,

from sklearn.datasets import make_multilabel_classification
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV

L=3
X, y = make_multilabel_classification(n_classes=L, n_labels=2,
                                  allow_unlabeled=True,
                                  random_state=1, return_indicator=True)    
model_to_set = OneVsRestClassifier(SVC())

parameters = {
    "estimator__C": [1,2,4,8],
    "estimator__kernel": ["poly","rbf"],
    "estimator__degree":[1, 2, 3, 4],
}

model_tunning = GridSearchCV(model_to_set, param_grid=parameters,
                             scoring='f1')

model_tunning.fit(X, y)

print model_tunning.best_score_
print model_tunning.best_params_

#0.855175822314
#{'estimator__kernel': 'poly', 'estimator__C': 1, 'estimator__degree': 3}

1st question

What is the number 0.85 representing for? Is it the best score among the L classifiers or the averaged one? Similarly, does the set of parameters stand for the best-scorer among L classifiers?

2nd question

Based on the fact that, if I am right, the OneVsRestClassifier literally builds L classifiers for each label, one can expect to access or observe the performance of EACH LABEL. But how, in the above example, to obtain L scores from the GridSearchCV object?

EDIT

To simplify the problem and help myself understand more about OneVsRestClassifier, before tuning model,

model_to_set.fit(X,y)
gp = model_to_set.predict(X) # the "global" prediction
fp = model_to_set.estimators_[0].predict(X) # the first-class prediction
sp = model_to_set.estimators_[1].predict(X) # the second-class prediction
tp = model_to_set.estimators_[2].predict(X) # the third-class prediction

It can be shown that gp.T[0]==fp, gp.T[1]==sp and gp.T[2]==tp. So the "global" prediction is simply the 'sequential' L individual predictions and the 2nd question is solved.

But it is still confusing for me that if one meta-classifier OneVsRestClassifier contains L classifiers, how can GridSearchCV returns only ONE best score, corresponding to one of 4*2*4 sets of parameters, for a meta-classifier OneVsRestClassifier having L classifiers?

It would be fairly appreciated to see any comment.

917

asked Nov 18 '15 15:11

Francis

3 Answers

GridSearchCV creates grid from your parameter values, it evaluates your OneVsRestClassifier as atomic classifier (I.e. GridSearchCV doesn't know what is inside this metaclassifier)

First: 0.85 is the best score of OneVsRestClassifier among all possible combinations (16 combinations in your case, 4*2*4) of parameters ("estimator__C", "estimator__kernel", "estimator__degree"), it means that GridSearchCV evaluates 16 (again, it's only in this particular case) possible OneVsRestClassifier's each of which contains L SVC's. All of that L classifiers inside one OneVsRestClassifier have same values of parameters (but each of them is learning to recognize their own class from L possible)

i.e. from set of

{OneVsRestClassifier(SVC(C=1, kernel="poly", degree=1)),
 OneVsRestClassifier(SVC(C=1, kernel="poly", degree=2)),
 ...,
 OneVsRestClassifier(SVC(C=8, kernel="rbf", degree=3)),
 OneVsRestClassifier(SVC(C=8, kernel="rbf", degree=4))}

it chooses one with the best score.

model_tunning.best_params_ here represents parameters for OneVsRestClassifier(SVC()) with which it will achieve model_tunning.best_score_. You can get that best OneVsRestClassifier from model_tunning.best_estimator_ attribute.

Second: There is no ready to use code to obtain separate scores for L classifiers from OneVsRestClassifier, but you can look at implementation of OneVsRestClassifier.fit method, or take this (should work :) ):

# Here X, y - your dataset
one_vs_rest = model_tunning.best_estimator_
yT = one_vs_rest.label_binarizer_.transform(y).toarray().T
# Iterate through all L classifiers
for classifier, is_ith_class in zip(one_vs_rest.estimators_, yT):
    print(classifier.score(X, is_ith_class))

200

answered Dec 12 '22 20:12

Ibraim Ganiev

Inspired by @Olologin 's answer, I realized that 0.85 is the best weighted average of f1 scores (in this example) obtained by L predictions. In the following code, I evaluate the model by inner test, using macro average of f1 score:

# Case A, inspect F1 score using the meta-classifier
F_A = f1_score(y, model_tunning.best_estimator_.predict(X), average='macro')

# Case B, inspect F1 scores of each label (binary task) and collect them by macro average
F_B = []
for label, clc in zip(y.T, model_tunning.best_estimator_.estimators_):
    F_B.append(f1_score(label, clf.predict(X)))
F_B = mean(F_B)

F_A==F_B # True

So it implies that the GridSearchCV applies one of 4*2*4 sets of parameters to build the meta-classifier which in turn makes prediction on each label with one of the L classifiers. The outcome will be L f1 scores for L labels, each of which is a performance of a binary task. Finally, a single score is obtained by taking average (macro or weighted average, specified by parameter in f1_score) of L f1 scores.

The GridSearchCV then choose the best averaged f1 scores among 4*2*4 sets of parameters, which is 0.85 in this example.

Though it is convenient to use the wrapper for multi-label problem, it can only maximize the averaged f1 score with a same set of parameters used to build L classifiers. If one wants to optimize the performance of each label separately, one seems to have to build L classifiers without using the wrapper.

answered Dec 12 '22 18:12

Francis

As for your second question, you might want to used GridSearchCV with scikit-multilearn's BinaryRelevance classifier. Like OneVsRestClassifier, Binary Relevance creates L single-label classifiers, one per label. For each label the training data is 1 if label is present and 0 if not present. The best selected classifier set is the BinaryRelevance class instance in best_estimator_ property of GridSearchCV. Use for predicting floats of probabilities use the predict_proba method of the BinaryRelevance object. An example can be found in the scikit-multilearn docs for model selection.

In your case I would run the following code:

from skmultilearn.problem_transform import BinaryRelevance
from sklearn.model_selection import GridSearchCV
import sklearn.metrics

model_to_set = BinaryRelevance(SVC())

parameters = {
    "classifier__estimator__C": [1,2,4,8],
    "classifier__estimator__kernel": ["poly","rbf"],
    "classifier__estimator__degree":[1, 2, 3, 4],
}

model_tunning = GridSearchCV(model_to_set, param_grid=parameters,
                             scoring='f1')

model_tunning.fit(X, y)

# for some X_test testing set
predictions = model_tunning.best_estimator_.predict(X_test)

# average=None gives per label score
metrics.f1_score(y_test, predictions, average = None)

Please note that there much better methods for multi-label classification than Binary Relevance :) You can find them in madjarov's comparison or my recent paper.

answered Dec 12 '22 18:12

niedakh

Related questions
                            
                                Import error no module named zlib (brew installed python)
                            
                                Python. How to get the x,y coordinates of a offset spline from a x,y list of points and offset distance
                            
                                Django override bulk_create
                            
                                Python: Assertion error, "not called"
                            
                                OpenCV's waitKey() alternative in IPython Notebook
                            
                                Psycopg2 - AttributeError: 'NoneType' object has no attribute 'fetchall'
                            
                                Querying Pandas DataFrame with column name that contains a space or using the drop method with a column name that contains a space
                            
                                An elegant way to make a 2d array with all possible columns
                            
                                how do I commit and push to github from python shell?
                            
                                In python, can you pass variadic arguments after named parameters?
                            
                                Preserve empty lines with NLTK's Punkt Tokenizer
                            
                                python pandas dataframe : removing selected rows
                            
                                Remove rotation effect when drawing a square grid of MxM nodes in networkx using grid_2d_graph
                            
                                How to get extended MacOS attributes of a file using python?
                            
                                Increase tkSimpleDialog window size
                            
                                Pandas dataframe apply refer to previous row to calculate difference
                            
                                django 1.8- if form entry query result does't match database, display alert message on same page, instead of "None" or raise exception page
                            
                                Why NLTK lemmatization has wrong output even if verb.exc has added right value?
                            
                                Efficient pairwise correlation for two matrices of features
                            
                                Filtering dataframes in pandas : use a list of conditions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sklearn: Evaluate performance of each classifier of OneVsRestClassifier inside GridSearchCV

Tags:

python

scikit-learn

multilabel-classification

grid-search

Francis

People also ask

3 Answers

Ibraim Ganiev

Francis

niedakh

Recent Activity

Donate For Us