Cross validation and model selection

Question

I am using sklearn for SVM training. I am using the cross-validation to evaluate the estimator and avoid the overfitting model.

I split the data into two parts. Train data and test data. Here is the code:

import numpy as np
from sklearn import cross_validation
from sklearn import datasets
from sklearn import svm

X_train, X_test, y_train, y_test = cross_validation.train_test_split(
    iris.data, iris.target, test_size=0.4, random_state=0
)
clf = svm.SVC(kernel='linear', C=1)
scores = cross_validation.cross_val_score(clf, X_train, y_train, cv=5)
print scores

Now I need to evaluate the estimator clf on X_test.

clf.score(X_test, y_test)

here, I get an error saying that the model is not fitted using fit(), but normally, in cross_val_score function the model is fitted? What is the problem?

ali_m · Accepted Answer

cross_val_score is basically a convenience wrapper for the sklearn cross-validation iterators. You give it a classifier and your whole (training + validation) dataset and it automatically performs one or more rounds of cross-validation by splitting your data into random training/validation sets, fitting the training set, and computing the score on the validation set. See the documentation here for an example and more explanation.

The reason why clf.score(X_test, y_test) raises an exception is because cross_val_score performs the fitting on a copy of the estimator rather than the original (see the use of clone(estimator) in the source code here). Because of this, clf remains unchanged outside of the function call, and is therefore not properly initialized when you call clf.fit.

Cross validation and model selection

Tags:

python

machine-learning

numpy

scikit-learn

cross-validation

Jeanne

1 Answers

ali_m

Recent Activity

Donate For Us

Cross validation and model selection

Tags:

python

machine-learning

numpy

scikit-learn

cross-validation

Jeanne

1 Answers

ali_m

Related questions

Recent Activity

Donate For Us