It is common practice to augment data (add samples programmatically, such as random crops, etc. in the case of a dataset consisting of images) on both training and test set, or just the training data set?
Not having a single validation fold, if anything allows us to gauge how variable our learner's performance is. So, yes, absolutely I would use data augmentation on the validation set as I see the validation and training set as a natural extension of one another in the case of repeated CV.
Abstract: Empirically, data augmentation sometimes improves and sometimes hurts test error, even when only adding points with labels from the true conditional distribution that the hypothesis class is expressive enough to fit.
Generally, the term “validation set” is used interchangeably with the term “test set” and refers to a sample of the dataset held back from training the model. The evaluation of a model skill on the training dataset would result in a biased score.
Test time augmentation (TTA) is a popular technique in computer vision. TTA aims at boosting the model accuracy by using data augmentation on the inference stage. The idea behind TTA is simple: for each test image, we create multiple versions that are a little different from the original (e.g., cropped or flipped).
Only on training. Data augmentation is used to increase the size of training set and to get more different images. Technically, you could use data augmentation on test set to see how model behaves on such images, but usually people don't do it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With