When I train a SVC with cross validation,
y_pred = cross_val_predict(svc, X, y, cv=5, method='predict')
cross_val_predict
returns one class prediction for each element in X, so that y_pred.shape = (1000,)
when m=1000
.
This makes sense, since cv=5
and therefore the SVC was trained and validated 5 times on different parts of X. In each of the five validations, predictions were made for one fifth of the instances (m/5 = 200
). Subsequently the 5 vectors, containing 200 predictions each, were merged to y_pred
.
With all of this in mind it would be reasonable for me to calculate the overall accuracy of the SVC using y_pred
and y.
score = accuracy_score(y, y_pred)
But (!) the documentation of cross_val_predict
states:
The result of cross_val_predict may be different from those obtained using cross_val_score as the elements are grouped in different ways. The function cross_val_score takes an average over cross-validation folds, whereas cross_val_predict simply returns the labels (or probabilities) from several distinct models undistinguished. Thus, cross_val_predict is not an appropriate measure of generalisation error.
Could someone please explain in other words, why cross_val_predict
is not appropriate for measuring the generalisation error e.g. via accuracy_score(y, y_pred)
?
Edit:
I first assumed that with cv=5
in each of the 5 validations predicitons would be made for all instances of X. But this is wrong, predictions are only made for 1/5 of the instances of X per validation.
The target variable to try to predict in the case of supervised learning. Group labels for the samples used while splitting the dataset into train/test set. Only used in conjunction with a “Group” cv instance (e.g., GroupKFold ). Determines the cross-validation splitting strategy.
The cross_validate function differs from cross_val_score in two ways: It allows specifying multiple metrics for evaluation. It returns a dict containing fit-times, score-times (and optionally training scores as well as fitted estimators) in addition to the test score.
Can I train my model using cross_val_score? A common question developers have is whether cross_val_score can also function as a way of training the final model. Unfortunately this is not the case. Cross_val_score is a way of assessing a model and it's parameters, and cannot be used for final training.
The cross_val_score are a set of n scores obtained from each of the folds in your n-fold cross validation (by default n=5). If you are working with a classification problem then StratifiedKFold ensures that folds created preserve the percentage of samples for each class.
Differences between cross_val_predict
and cross_val_score
are described really clearly here and there is another link in there, so you can follow the rabbit.
In essence:
cross_val_score
returns score for each fold
cross_val_predict
makes out of fold predictions for each data point.Now, you have no way of knowing which predictions in cross_val_predict
came from which fold, hence you cannot calculate average per fold as cross_val_score
does. You could average cross_val_score
and accuracy_score
of cross_val_predict
, but average of averages is not equal to average, hence results would be different.
If one fold has a very low accuracy, it would impact the overall average more than in the case of averaged cross_val_predict
.
Furthermore, you could group those seven data points differently and get different results. That's why there is information about grouping making the difference.
Let's imagine cross_val_predict
uses 3 folds for 7 data points and out of fold predictions are [0,1,1,0,1,0,1]
, while true targets are [0,1,1,0,1,1,0]
. Accuracy score would be calculated as 5/7 (only the last two were badly predicted).
Now take those same predictions and split them into following 3 folds:
[0, 1, 1]
- prediction and [0, 1, 1]
target -> accuracy of 1 for first fold[0, 1]
- prediction and [0, 1]
target -> perfect accuracy again[0, 1]
- prediction and [1, 0]
target -> 0 accuracyThis is what cross_val_score
does and would return a tuple of accuracies, namely [1, 1, 0]
. Now, you can average this tuple and total accuracy is 2/3
.
See? With the same data, you would get two different measures of accuracy (one being 5/7
and the other 2/3
).
In both cases, grouping changed total accuracy you would obtain. Classifier errors are more severe with cross_val_score
, as each errors influences the group's accuracy more than it would influence the average accuracy of all predictions (you can check it on your own).
Both could be used for evaluating your model's performance on validation set though and I see no contraindication, just different behavior (fold errors not being as severe).
If you fit your algorithm according to cross validation schemes, you are performing data leakage (fine-tuning it for the train and validation data). In order to get a sense of generalization error, you would have to leave a part of your data out of cross validation and training.
You may want to perform double cross validation or just leave test set out to get how well your model actually generalizes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With