Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create train, test and validation splits in tensorflow 2.0

I am new to tensorflow, and I have started to use tensorflow 2.0

I have built a tensorflow dataset for a multi-class classification problem. Let's call this labeled_ds. I have prepared this dataset by loading all the image files from their respective class wise directories. I have followed along the tutorial here : tensorflow guide to load image dataset

Now, I need to split labeld_ds into three disjoint pieces : train, validation and test. I was going through the tensorflow API, but there was no example which allows to specify the split percentages. I found something in the load method, but I am not sure how to use it. Further, how can I get splits to be stratified ?

# labeled_ds contains multi class data, which is unbalanced.
train_ds, val_ds, test_ds = tf.data.Dataset.tfds.load(labeled_ds, split=["train", "validation", "test"])

I am stuck here, would appreciate any advice on how to progress from here. Thanks in advance.

like image 475
Swaroop Avatar asked Oct 15 '19 21:10

Swaroop


People also ask

How do you split training validation and test data?

Split the dataset We can use the train_test_split to first make the split on the original dataset. Then, to get the validation set, we can apply the same function to the train set to get the validation set. In the function below, the test set size is the ratio of the original data we want to use as the test set.

Is train test Split same as cross validation?

In the previous paragraph, I mentioned the caveats in the train/test split method. In order to avoid this, we can perform something called cross validation. It's very similar to train/test split, but it's applied to more subsets. Meaning, we split our data into k subsets, and train on k-1 one of those subset.

How would you split the train Dev test set?

The best and most secure way to split the data into these three sets is to have one directory for train, one for dev and one for test. For instance if you have a dataset of images, you could have a structure like this with 80% in the training set, 10% in the dev set and 10% in the test set.

What is train test splitting?

The train-test split is used to estimate the performance of machine learning algorithms that are applicable for prediction-based Algorithms/Applications. This method is a fast and easy procedure to perform such that we can compare our own machine learning model results to machine results.


2 Answers

Please refer below code to create train, test and validation splits using tensorflow dataset "oxford_flowers102"

!pip install tensorflow==2.0.0

import tensorflow as tf
print(tf.__version__)
import tensorflow_datasets as tfds

labeled_ds, summary = tfds.load('oxford_flowers102', split='train+test+validation', with_info=True)

labeled_all_length = [i for i,_ in enumerate(labeled_ds)][-1] + 1

train_size = int(0.8 * labeled_all_length)
val_test_size = int(0.1 * labeled_all_length)

df_train = labeled_ds.take(train_size)
df_test = labeled_ds.skip(train_size)
df_val = df_test.skip(val_test_size)
df_test = df_test.take(val_test_size)

df_train_length = [i for i,_ in enumerate(df_train)][-1] + 1
df_val_length = [i for i,_ in enumerate(df_val)][-1] + 1
df_test_length = [i for i,_ in enumerate(df_test)][-1] + 1

print('Original: ', labeled_all_length)
print('Train: ', df_train_length)
print('Validation :', df_val_length)
print('Test :', df_test_length)
like image 122
bsquare Avatar answered Sep 28 '22 20:09

bsquare


I had the same problem

It depends on the dataset, most of which have a train and test set. In this case you can do the following (assuming 80-10-10 split):

splits, info = tfds.load('fashion_mnist', with_info=True, as_supervised=True,
split=['train+test[:80]','train+test[80:90]', 'train+test[90:]'],
data_dir=filePath)
like image 40
Francesco Boi Avatar answered Sep 28 '22 20:09

Francesco Boi