Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cross Validation in Weka

I've always thought from what I read that cross validation is performed like this:

In k-fold cross-validation, the original sample is randomly partitioned into k subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times (the folds), with each of the k subsamples used exactly once as the validation data. The k results from the folds then can be averaged (or otherwise combined) to produce a single estimation

So k models are built and the final one is the average of those. In Weka guide is written that each model is always built using ALL the data set. So how does cross validation in Weka work ? Is the model built from all data and the "cross-validation" means that k fold are created then each fold is evaluated on it and the final output results is simply the averaged result from folds?

like image 764
Titus Pullo Avatar asked May 03 '12 18:05

Titus Pullo


People also ask

Why we do cross-validation in Weka?

It helps reduce the variance in the estimate a little bit more. Then, once we've done the cross-validation, what Weka does is run the algorithm an eleventh time on the whole dataset. That will produce a classifier that we might deploy in practice.

What is cross-validation used for?

Cross-validation is primarily used in applied machine learning to estimate the skill of a machine learning model on unseen data. That is, to use a limited sample in order to estimate how the model is expected to perform in general when used to make predictions on data not used during the training of the model.

What is meant by cross-validation in data mining?

Cross-validation is a standard tool in analytics and is an important feature for helping you develop and fine-tune data mining models. You use cross-validation after you have created a mining structure and related mining models to ascertain the validity of the model.

What is cross-validation method?

Cross-validation is a technique for evaluating ML models by training several ML models on subsets of the available input data and evaluating them on the complementary subset of the data. Use cross-validation to detect overfitting, ie, failing to generalize a pattern.


1 Answers

So, here is the scenario again: you have 100 labeled data

Use training set

  • weka will take 100 labeled data
  • it will apply an algorithm to build a classifier from these 100 data
  • it applies that classifier AGAIN on these 100 data
  • it provides you with the performance of the classifier (applied to the same 100 data from which it was developed)

Use 10 fold CV

  • Weka takes 100 labeled data

  • it produces 10 equal sized sets. Each set is divided into two groups: 90 labeled data are used for training and 10 labeled data are used for testing.

  • it produces a classifier with an algorithm from 90 labeled data and applies that on the 10 testing data for set 1.

  • It does the same thing for set 2 to 10 and produces 9 more classifiers

  • it averages the performance of the 10 classifiers produced from 10 equal sized (90 training and 10 testing) sets

Let me know if that answers your question.

like image 172
Rushdi Shams Avatar answered Oct 05 '22 06:10

Rushdi Shams