From my understanding, One-Class SVM's are trained without target/label data.
One answer at Use of OneClassSVM with GridSearchCV suggests passing Target/Label data to GridSearchCV's fit method when the classifier is the OneClassSVM
.
How does the GridSearchCV
method handle this data?
Does it actually train the OneClassSVM
without the Target/label data, and just use the Target/label data for evaluation?
I tried following the GridSearchCV source code, but I couldn't find the answer.
The fit() method takes the training data as arguments, which can be one array in the case of unsupervised learning, or two arrays in the case of supervised learning.
fit method takes two parameters, the list of points and another list of just y coordinates. X are your data samples, where each row is a datapoint (one sample, a N-dimensional feature vector). y are the datapoint labels, one per datapoint.
Then, a classifier named clf is defined as an object for our model in the fourth line. The fit method in fifth line fits the training dataset as features (data) and labels (target) into the Naive Bayes' model. The predict method predicts our actual testing dataset with regard to the fitted (training) data.
Linear discriminant analysis, as you may be able to guess, is a linear classification algorithm and best used when the data has a linear relationship.
Does it actually train the OneClassSVM without the Target/label data, and just use the Target/label data for evaluation?
Yes to both.
GridSearchCV does actually send labels to OneClassSVM in fit
call, but OneClassSVM simply ignores it. Notice in the 2nd link how an array of ones is sent to main SVM trainer instead of given label array y
. Parameters like y
in fit
exists only so that meta estimators like GridSearchCV
can work in a consistent way without worrying about supervised/unsupervised estimators.
To actually test this, lets first detect outliers using GridSearchCV:
X,y = load_iris(return_X_y=True)
yd = np.where(y==0,-1,1)
cv = KFold(n_splits=4,random_state=42,shuffle=True)
model = GridSearchCV(OneClassSVM(),{'gamma':['scale']},cv=cv,iid=False,scoring=make_scorer(f1_score))
model = model.fit(X,yd)
print(model.cv_results_)
Note all the splitx_test_score
in cv_results_
.
Now lets do it manually, without sending labels yd
during fit
call:
for train,test in cv.split(X,yd):
clf = OneClassSVM(gamma='scale').fit(X[train]) #Just features
print(f1_score(yd[test],clf.predict(X[test])))
Both should yield exactly same scores.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With