How to create train, test and validation splits in tensorflow 2.0

Tags:

I am new to tensorflow, and I have started to use tensorflow 2.0

I have built a tensorflow dataset for a multi-class classification problem. Let's call this labeled_ds. I have prepared this dataset by loading all the image files from their respective class wise directories. I have followed along the tutorial here : tensorflow guide to load image dataset

Now, I need to split labeld_ds into three disjoint pieces : train, validation and test. I was going through the tensorflow API, but there was no example which allows to specify the split percentages. I found something in the load method, but I am not sure how to use it. Further, how can I get splits to be stratified ?

# labeled_ds contains multi class data, which is unbalanced.
train_ds, val_ds, test_ds = tf.data.Dataset.tfds.load(labeled_ds, split=["train", "validation", "test"])

I am stuck here, would appreciate any advice on how to progress from here. Thanks in advance.

475

asked Oct 15 '19 21:10

Swaroop

2 Answers

Please refer below code to create train, test and validation splits using tensorflow dataset "oxford_flowers102"

!pip install tensorflow==2.0.0

import tensorflow as tf
print(tf.__version__)
import tensorflow_datasets as tfds

labeled_ds, summary = tfds.load('oxford_flowers102', split='train+test+validation', with_info=True)

labeled_all_length = [i for i,_ in enumerate(labeled_ds)][-1] + 1

train_size = int(0.8 * labeled_all_length)
val_test_size = int(0.1 * labeled_all_length)

df_train = labeled_ds.take(train_size)
df_test = labeled_ds.skip(train_size)
df_val = df_test.skip(val_test_size)
df_test = df_test.take(val_test_size)

df_train_length = [i for i,_ in enumerate(df_train)][-1] + 1
df_val_length = [i for i,_ in enumerate(df_val)][-1] + 1
df_test_length = [i for i,_ in enumerate(df_test)][-1] + 1

print('Original: ', labeled_all_length)
print('Train: ', df_train_length)
print('Validation :', df_val_length)
print('Test :', df_test_length)

122

answered Sep 28 '22 20:09

bsquare

I had the same problem

It depends on the dataset, most of which have a train and test set. In this case you can do the following (assuming 80-10-10 split):

splits, info = tfds.load('fashion_mnist', with_info=True, as_supervised=True,
split=['train+test[:80]','train+test[80:90]', 'train+test[90:]'],
data_dir=filePath)

answered Sep 28 '22 20:09

Francesco Boi

Related questions
                            
                                Conda SafetyError: file has an incorrect size
                            
                                Loss is NaN on image classification task
                            
                                Fast way to find the closest polygon to a point
                            
                                Keras custom loss function (elastic net)
                            
                                How to reset locale back to original after changing it in Python?
                            
                                Pipenv: dependencies of platform specific packages are installed unconditionally?
                            
                                pyspark 'DataFrame' object has no attribute '_get_object_id'
                            
                                Is `pickle.dump(d, f)` equivalent to `f.write(pickle.dumps(d))`?
                            
                                Variable tf.Variable has 'None' for gradient in TensorFlow Probability
                            
                                How is learning rate decay implemented by Adam in keras
                            
                                How do you remove a comment in ruamel.yaml?
                            
                                Every product/combination of nested dictionaries saved to DataFrame
                            
                                Is there a way to turn a date-indexed dataframe containing durations of events, into a dataframe of binary data showing event for each day?
                            
                                Numpy concatenate + merge 1D arrays
                            
                                pd.Series assignment with pd.IndexSlice results in NaN values despite matching indices
                            
                                How to debug (500) Internal Server Error on Python Waitress server?
                            
                                Lateral Join in django queryset (in order to use jsonb_to_recordset postgresql function)
                            
                                Airflow stack webserver failing to resolve postgres related attribute, fails to start
                            
                                Jupyter notebook color different parentheses by different colors
                            
                                Loopback Access Token To Flask

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to create train, test and validation splits in tensorflow 2.0

Tags:

python

tensorflow

tensorflow2.0

tensorflow-datasets

Swaroop

People also ask

2 Answers

bsquare

Francesco Boi

Recent Activity

Donate For Us