Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: k-fold cross-validation for train data set

I am doing some classification tasks on heart disease dataset using C5.0 in R, in most common case the data will be divided into 80% for training, and 20% for testing, I want to use k-fold cross validation (k=10), but I am confused about this point, as we know by using 10-fold cross validation, we will divide the whole data into 9 subsets for train and one subset for the test.

Is it possible to divide the data into 80% for training and 20% for testing and then applying k-fold cross-validation on train data? or I have to apply k-fold cross-validation on the whole data set?

like image 871
Noor Avatar asked Dec 14 '25 23:12

Noor


1 Answers

One option would be k=5. In this case you train with 80% and test with 20%. But for that you don't need to use k-fold cross-validation.

k-fold cross-validation is always on the whole data set. So with k=5 there are 5 possible scenarios that are tested and compared.

like image 172
Dan Avatar answered Dec 16 '25 16:12

Dan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!