Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform a train, test and validation set to predict

I have a really large dataset and i'm trying to build a classification model using R. However I need to use a train, test and validation set. But i'm a bit confused about the way to perform this. For example, I built a tree using a train set and then i computed the predicion using a test set. But I believe that i should be using the train and the test set to best tune the tree and after that use the validation set to validate. How can i do this?

library(rpart)
part.installed <- rpart(TARGET ~  RS_DESC+SAP_STATUS +                         
ACTIVATION_STATUS+ROTUL_STATUS+SIM_STATUS+RATE_PLAN_SEGMENT_NORM,
trainSet, method="class")

part.predictions <- predict(part.installed, testSet, type="class")

(P.S the tree is only an example. It could be another classification algorithm)

like image 320
Carolina Leana Santos Avatar asked Dec 04 '25 04:12

Carolina Leana Santos


1 Answers

Usually the terminology is as follows:

  1. The training set is used to build the classifier
  2. The validation set is used to tune the algorithm hyperparameters repeatedly. So there will be some overfitting here, but that is why there is another stage:
  3. The test set must not be touched until the classifier is final to prevent overfitting. It serves to estimate the true accuracy, if you would put the model into production.
like image 146
Has QUIT--Anony-Mousse Avatar answered Dec 05 '25 21:12

Has QUIT--Anony-Mousse



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!