Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

10 fold cross validation

In k fold we have this: you divide the data into k subsets of (approximately) equal size. You train the net k times, each time leaving out one of the subsets from training, but using only the omitted subset to compute whatever error criterion interests you. If k equals the sample size, this is called "leave-one-out" cross-validation. "Leave-v-out" is a more elaborate and expensive version of cross-validation that involves leaving out all possible subsets of v cases.

what the Term training and testing mean?I can't understand.

would you please tell me some references where I can learn this algorithm with an example?

Train classifier on folds: 2 3 4 5 6 7 8 9 10; Test against fold: 1
Train classifier on folds: 1 3 4 5 6 7 8 9 10; Test against fold: 2
Train classifier on folds: 1 2 4 5 6 7 8 9 10; Test against fold: 3
Train classifier on folds: 1 2 3 5 6 7 8 9 10; Test against fold: 4
Train classifier on folds: 1 2 3 4 6 7 8 9 10; Test against fold: 5
Train classifier on folds: 1 2 3 4 5 7 8 9 10; Test against fold: 6
Train classifier on folds: 1 2 3 4 5 6 8 9 10; Test against fold: 7
Train classifier on folds: 1 2 3 4 5 6 7 9 10; Test against fold: 8
Train classifier on folds: 1 2 3 4 5 6 7 8 10; Test against fold: 9
Train classifier on folds: 1 2 3 4 5 6 7 8 9;  Test against fold: 10  
like image 732
Nickool Avatar asked Oct 01 '11 10:10

Nickool


People also ask

What is 10 folds cross-validation?

10-fold cross validation would perform the fitting procedure a total of ten times, with each fit being performed on a training set consisting of 90% of the total training set selected at random, with the remaining 10% used as a hold out set for validation.

Why do we use 10-fold cross-validation?

Why most machine learning applications use 10-fold cross-validation. In training machine learning models it is believed that a k-fold cross-validation technique, usually offer better model performance in small dataset. Also, computationally inexpensive compare to other training techniques.

How many folds should I use for cross-validation?

When performing cross-validation, it is common to use 10 folds.

What is folds in cross-validation?

That k-fold cross validation is a procedure used to estimate the skill of the model on new data. There are common tactics that you can use to select the value of k for your dataset. There are commonly used variations on cross-validation, such as stratified and repeated, that are available in scikit-learn.


2 Answers

In short: Training is the process of providing feedback to the algorithm in order to adjust the predictive power of the classifier(s) it produces.

Testing is the process of determining the realistic accuracy of the classifier(s) which were produced by the algorithm. During testing, the classifier(s) are given never-before-seen instances of data to do a final confirmation that the classifier's accuracy is not drastically different from that during training.

However, you're missing a key step in the middle: the validation (which is what you're referring to in the 10-fold/k-fold cross validation).

Validation is (usually) performed after each training step and it is performed in order to help determine if the classifier is being overfitted. The validation step does not provide any feedback to the algorithm in order to adjust the classifier, but it helps determine if overfitting is occurring and it signals when the training should be terminated.

Think about the process in the following manner:

1. Train on the training data set.
2. Validate on the validation data set.
if(change in validation accuracy > 0)
   3. repeat step 1 and 2
else
   3. stop training
4. Test on the testing data set.
like image 155
Kiril Avatar answered Sep 22 '22 23:09

Kiril


In k-fold method, you have to divide the data into k segments, k-1 of them are used for training, while one is left out and used for testing. It is done k times, first time, the first segment is used for testing, and remaining are used for training, then the second segment is used for testing, and remaining are used for training, and so on. It is clear from your example of 10 fold, so it should be simple, read again.

Now about what training is and what testing is:

Training in classification is the part where a classification model is created, using some algorithm, popular algorithms for creating training models are ID3, C4.5 etc.

Testing means to evaluate the classification model by running the model over the test data, and then creating a confusion matrix and then calculating the accuracy and error rate of the model.

In K-fold method, k models are created (as clear from the description above) and the most accurate model for classification is the selected.

like image 29
SpeedBirdNine Avatar answered Sep 26 '22 23:09

SpeedBirdNine