Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split folder of images into test/training/validation sets with stratified sampling?

I have a very large folder of images, as well as a CSV file containing the class labels for each of those images. Because it's all in one giant folder, I'd like to split them up into training/test/validation sets; maybe create three new folders and move images into each based on a Python script of some kind. I'd like to do stratified sampling so I can keep the % of classes the same across all three sets.

What would be the approach to go about making a script that can do this?

like image 623
Yuerno Avatar asked Oct 31 '18 00:10

Yuerno


People also ask

How does stratify work in train test split?

We can achieve this by setting the “stratify” argument to the y component of the original dataset. This will be used by the train_test_split() function to ensure that both the train and test sets have the proportion of examples in each class that is present in the provided “y” array.

How do I split a folder into multiple folders?

Using the file panel, select the zip folder that you want to split. Click Add to Zip and select the split option. Choose the save location and split the folder.


1 Answers

Use the python library split-folder.

pip install split-folders

Let all the images be stored in Data folder. Then apply as follows:

import splitfolders
splitfolders.ratio('Data', output="output", seed=1337, ratio=(.8, 0.1,0.1)) 

On running the above code snippet, it will create 3 folders in the output directory:

  • train
  • val
  • test

The number of images in each folder can be varied using the values in the ratio argument(train:val:test).

like image 124
AVISHEK GARAIN Avatar answered Sep 20 '22 06:09

AVISHEK GARAIN