From my understanding, One-Class SVM's are trained without target/label data. One answer at Use of OneClassSVM with GridSearchCV suggests passing Target/Label data to GridSearchCV's fit method when the classifier is the <code>OneClassSVM</code>. How does the <code>GridSearchCV</code> method handle this data? Does it actually train the <code>OneClassSVM</code> without the Target/label data, and just use the Target/label data for evaluation? I tried following the GridSearchCV source code, but I couldn't find the answer.

<blockquote> Does it actually train the OneClassSVM without the Target/label data, and just use the Target/label data for evaluation? </blockquote> Yes to both. GridSearchCV does actually send labels to OneClassSVM in <code>fit</code> call, but OneClassSVM simply ignores it. Notice in the 2nd link how an array of ones is sent to main SVM trainer instead of given label array <code>y</code>. Parameters like <code>y</code> in <code>fit</code> exists only so that meta estimators like <code>GridSearchCV</code> can work in a consistent way without worrying about supervised/unsupervised estimators. To actually test this, lets first detect outliers using GridSearchCV: <pre class="prettyprint"><code>X,y = load_iris(return_X_y=True) yd = np.where(y==0,-1,1) cv = KFold(n_splits=4,random_state=42,shuffle=True) model = GridSearchCV(OneClassSVM(),{'gamma':['scale']},cv=cv,iid=False,scoring=make_scorer(f1_score)) model = model.fit(X,yd) print(model.cv_results_) </code></pre> Note all the <code>splitx_test_score</code> in <code>cv_results_</code>. Now lets do it manually, without sending labels <code>yd</code> during <code>fit</code> call: <pre class="prettyprint"><code>for train,test in cv.split(X,yd): clf = OneClassSVM(gamma='scale').fit(X[train]) #Just features print(f1_score(yd[test],clf.predict(X[test]))) </code></pre> Both should yield exactly same scores.

Passing Target/Label data to Scikit-learn GridSearchCV's fit method for OneClassSVM

1 Answers

Does it actually train the OneClassSVM without the Target/label data, and just use the Target/label data for evaluation?

Yes to both.

GridSearchCV does actually send labels to OneClassSVM in fit call, but OneClassSVM simply ignores it. Notice in the 2nd link how an array of ones is sent to main SVM trainer instead of given label array y. Parameters like y in fit exists only so that meta estimators like GridSearchCV can work in a consistent way without worrying about supervised/unsupervised estimators.

To actually test this, lets first detect outliers using GridSearchCV:

X,y = load_iris(return_X_y=True)
yd = np.where(y==0,-1,1)
cv = KFold(n_splits=4,random_state=42,shuffle=True)
model = GridSearchCV(OneClassSVM(),{'gamma':['scale']},cv=cv,iid=False,scoring=make_scorer(f1_score))
model = model.fit(X,yd)
print(model.cv_results_)

Note all the splitx_test_score in cv_results_.

Now lets do it manually, without sending labels yd during fit call:

for train,test in cv.split(X,yd):
    clf = OneClassSVM(gamma='scale').fit(X[train])  #Just features
    print(f1_score(yd[test],clf.predict(X[test])))

Both should yield exactly same scores.

answered Nov 09 '22 15:11

Shihab Shahriar Khan

Related questions
                            
                                How do I (safely) send a Python object to my Flask API?
                            
                                Why doesn't my custom made linear regression model match sklearn?
                            
                                Multi-label compute class weight - unhashable type
                            
                                How to use SHAP with a linear SVC model from sklearn using Pipeline?
                            
                                How to get CountVectorizer feature_names in order that they are set, not alphabetical?
                            
                                Multilabel classification ML-kNN vs KNN
                            
                                Crossvalidation of Keras model with multiply inputs with scikit-learn
                            
                                Consistent ColumnTransformer for intersecting lists of columns
                            
                                Custom Criterion for DecisionTreeRegressor in sklearn
                            
                                understanding sklearn calibratedClassifierCV
                            
                                python sklearn KDTree with haversine distance
                            
                                sklearn RandomForestRegressor discrepancy in the displayed tree values
                            
                                Performing PCA on large sparse matrix by using sklearn
                            
                                Using scikit-learn vectorizers and vocabularies with gensim
                            
                                Under what parameters are SVC and LinearSVC in scikit-learn equivalent?
                            
                                What is the difference between sample weight and class weight options in scikit learn?
                            
                                How can I fix a MemoryError when executing scikit-learns silhouette score?
                            
                                Singleton array array(<function train at 0x7f3a311320d0>, dtype=object) cannot be considered a valid collection
                            
                                ImportError: cannot import name inplace_column_scale
                            
                                Clustering words based on Distance Matrix

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Passing Target/Label data to Scikit-learn GridSearchCV's fit method for OneClassSVM

Tags:

svm

unsupervised-learning

scikit-learn

gridsearchcv

one-class-classification

user3731622

People also ask

1 Answers

Shihab Shahriar Khan

Recent Activity

Donate For Us