How to split folder of images into test/training/validation sets with stratified sampling?

Tags:

I have a very large folder of images, as well as a CSV file containing the class labels for each of those images. Because it's all in one giant folder, I'd like to split them up into training/test/validation sets; maybe create three new folders and move images into each based on a Python script of some kind. I'd like to do stratified sampling so I can keep the % of classes the same across all three sets.

What would be the approach to go about making a script that can do this?

623

asked Oct 31 '18 00:10

Yuerno

1 Answers

Use the python library split-folder.

pip install split-folders

Let all the images be stored in Data folder. Then apply as follows:

import splitfolders
splitfolders.ratio('Data', output="output", seed=1337, ratio=(.8, 0.1,0.1))

On running the above code snippet, it will create 3 folders in the output directory:

train
val
test

The number of images in each folder can be varied using the values in the ratio argument(train:val:test).

124

answered Sep 20 '22 06:09

AVISHEK GARAIN

Related questions
                            
                                Possible to alter worksheet order in xlsxwriter?
                            
                                How to store argparse values in variables?
                            
                                tabular legend layout for matplotlib
                            
                                How to return all list elements of a given length?
                            
                                Renaming PyCharm project does not change name at top of window
                            
                                Python - ERROR - failed to write data to stream: <open file '<stdout>', mode 'w' at 0x104c8f150>
                            
                                Django form returns is_valid() = False and no errors
                            
                                Is it possible to modify the behavior of len()?
                            
                                Convert int to double Python [duplicate]
                            
                                How to filter for multiple ids from a query param on a GET request with django rest framework?
                            
                                How to find all comments with Beautiful Soup
                            
                                Gensim: TypeError: doc2bow expects an array of unicode tokens on input, not a single string
                            
                                How to decrypt password from JavaScript CryptoJS.AES.encrypt(password, passphrase) in Python
                            
                                How to break numpy array into smaller chunks/batches, then iterate through them
                            
                                Replace apostrophe/short words in python
                            
                                Pandas: assign category based on where value falls in range
                            
                                Installing pycurl with 'fatal error: gnutls/gnutls.h: No such file or directory'
                            
                                Why does max() sometimes return nan and sometimes ignores it?
                            
                                Convert list of arrays to pandas dataframe
                            
                                Adding a data file in Pyinstaller using the onefile option

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to split folder of images into test/training/validation sets with stratified sampling?

Tags:

python

python-3.x

Yuerno

People also ask

1 Answers

AVISHEK GARAIN

Recent Activity

Donate For Us