Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cross validation and model selection

I am using sklearn for SVM training. I am using the cross-validation to evaluate the estimator and avoid the overfitting model.

I split the data into two parts. Train data and test data. Here is the code:

import numpy as np
from sklearn import cross_validation
from sklearn import datasets
from sklearn import svm

X_train, X_test, y_train, y_test = cross_validation.train_test_split(
    iris.data, iris.target, test_size=0.4, random_state=0
)
clf = svm.SVC(kernel='linear', C=1)
scores = cross_validation.cross_val_score(clf, X_train, y_train, cv=5)
print scores

Now I need to evaluate the estimator clf on X_test.

clf.score(X_test, y_test)

here, I get an error saying that the model is not fitted using fit(), but normally, in cross_val_score function the model is fitted? What is the problem?

like image 904
Jeanne Avatar asked Feb 15 '16 14:02

Jeanne


1 Answers

cross_val_score is basically a convenience wrapper for the sklearn cross-validation iterators. You give it a classifier and your whole (training + validation) dataset and it automatically performs one or more rounds of cross-validation by splitting your data into random training/validation sets, fitting the training set, and computing the score on the validation set. See the documentation here for an example and more explanation.

The reason why clf.score(X_test, y_test) raises an exception is because cross_val_score performs the fitting on a copy of the estimator rather than the original (see the use of clone(estimator) in the source code here). Because of this, clf remains unchanged outside of the function call, and is therefore not properly initialized when you call clf.fit.

like image 172
ali_m Avatar answered Sep 28 '22 07:09

ali_m