I'm trying to run the following Colab project, but when I want to split the training data into validation and train parts I get this error:
KeyError: "Invalid split train[:70%]. Available splits are: ['train']"
I use the following code:
(training_set, validation_set), dataset_info = tfds.load(
'tf_flowers',
split=['train[:70%]', 'train[70%:]'],
with_info=True,
as_supervised=True,
)
How I can fix this error?
A set of training data can be split into training data and tests using train_test_split (). With this, the input data, X and Y, are divided to get eighty-20 train test splits in random order (test_size is parameter to determine a test size). In other words, train sizes can be measured by testing train speed!!
We can use the train_test_split to first make the split on the original dataset. Then, to get the validation set, we can apply the same function to the train set to get the validation set. In the function below, the test set size is the ratio of the original data we want to use as the test set.
Splitting is possible by passing split parameter to tfds.load like so split="test [:70%]". With the above code the training_set has 2569 entries, while validation_set has 1101.
Now Tensorflow doesn't contain any tools for that. You could use sklearn.model_selection.train_test_split to generate train/eval/test dataset, then create tf.data.Dataset respectively. sklearn requires that stuff fits in memory, TF Data does not.
According to the Tensorflow Dataset docs the approach you presented is now supported. Splitting is possible by passing split parameter to tfds.load
like so split="test[:70%]"
.
(training_set, validation_set), dataset_info = tfds.load(
'tf_flowers',
split=['train[:70%]', 'train[70%:]'],
with_info=True,
as_supervised=True,
)
With the above code the training_set
has 2569 entries, while validation_set
has 1101.
Thank you Saman for the comment on API deprecation:
In previous Tensorflow version it was possible to use tfds.Split
API which is now deprecated:
(training_set, validation_set), dataset_info = tfds.load(
'tf_flowers',
split=[
tfds.Split.TRAIN.subsplit(tfds.percent[:70]),
tfds.Split.TRAIN.subsplit(tfds.percent[70:])
],
with_info=True,
as_supervised=True,
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With