My question: How do I obtain the training error in the svm module (SVC class)?
I am trying to do a plot of error of the train set and test set against the number of training data used ( or other features such as C / gamma ). However, according to the SVM documentation , there is no such exposed attribute or method to return such data. I did find that RandomForestClassifier does expose a oob_score_ though.
This is called the training error; it is the same as 1/n× sum of squared residuals we studied earlier. Of course, based on our discussion of bias and variance, we should expect that training error is too optimistic relative to the error on a new test set. E[(Y − ˆ f(X))2|X,Y, X = Xi].
It is very important to understand the difference between a training error and a test error. Remember that the training error is calculated by using the same data for training the model and calculating its error rate. For calculating the test error, you are using completely disjoint data sets for both tasks.
"cross_val_score" splits the data into say 5 folds. Then for each fold it fits the data on 4 folds and scores the 5th fold. Then it gives you the 5 scores from which you can calculate a mean and variance for the score. You crossval to tune parameters and get an estimate of the score.
Computing cross-validated metrics. The simplest way to use cross-validation is to call the cross_val_score helper function on the estimator and the dataset. >>> from sklearn. model_selection import cross_val_score >>> clf = svm.
Just compute the score on the training data:
>>> model.fit(X_train, y_train).score(X_train, y_train)
You can also use any other performance metrics from the sklearn.metrics
module. The doc is here:
http://scikit-learn.org/stable/modules/model_evaluation.html
Also: oob_score_
is an estimate of the test / validation score, not the training score.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With