Let's say I have defined a dataset in this way:
filename_dataset = tf.data.Dataset.list_files("{}/*.png".format(dataset))
how can I get the number of elements that are inside the dataset (hence, the number of single elements that compose an epoch)?
I know that tf.data.Dataset
already knows the dimension of the dataset, because the repeat()
method allows repeating the input pipeline for a specified number of epochs. So it must be a way to get this information.
TensorFlow Datasets is a collection of datasets ready to use, with TensorFlow or other Python ML frameworks, such as Jax. All datasets are exposed as tf. data. Datasets , enabling easy-to-use and high-performance input pipelines. To get started see the guide and our list of datasets.
from_tensor_slices creates a dataset with a separate element for each row of the input tensor: >>> t = tf.constant([[1, 2], [3, 4]]) >>> ds = tf.data.Dataset.from_tensor_slices(t) >>> [x for x in ds] [<tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 2], dtype=int32)>, <tf.Tensor: shape=(2,), dtype=int32, numpy= ...
Dataset abstraction that represents a sequence of elements, in which each element consists of one or more components.
len(list(dataset))
works in eager mode, although that's obviously not a good general solution.
Take a look here: https://github.com/tensorflow/tensorflow/issues/26966
It doesn't work for TFRecord datasets, but it works fine for other types.
TL;DR:
num_elements = tf.data.experimental.cardinality(dataset).numpy()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With