How to acquire tf.data.dataset's shape?

Tags:

I know dataset has output_shapes, but it shows like below:

data_set: DatasetV1Adapter shapes: {item_id_hist: (?, ?), tags: (?, ?), client_platform: (?,), entrance: (?,), item_id: (?,), lable: (?,), mode: (?,), time: (?,), user_id: (?,)}, types: {item_id_hist: tf.int64, tags: tf.int64, client_platform: tf.string, entrance: tf.string, item_id: tf.int64, lable: tf.int64, mode: tf.int64, time: tf.int64, user_id: tf.int64}

How can I get the total number of my data?

701

asked May 20 '19 09:05

cao xiangyu

3 Answers

Where the length is known you can call:

tf.data.experimental.cardinality(dataset)

but if this fails then, it's important to know that a TensorFlow Dataset is (in general) lazily evaluated so this means that in the general case we may need to iterate over every record before we can find the length of the dataset.

For example, assuming you have eager execution enabled and its a small 'toy' dataset that fits comfortably in memory you could just enumerate it into a new list and grab the last index (then add 1 because lists are zero-indexed):

dataset_length = [i for i,_ in enumerate(dataset)][-1] + 1

Of course this is inefficient at best and, for large datasets, will fail entirely because everything needs to fit into memory for the list. in such circumstances I can't see any alternative other than to iterate through the records keeping a manual count.

167

answered Oct 19 '22 21:10

Stewart_R

Code as below:

dataset_to_numpy = list(dataset.as_numpy_iterator())
shape = tf.shape(dataset_to_numpy)
print(shape)

It produces output like this:

tf.Tensor([1080   64   64    3], shape=(4,), dtype=int32)

It's simple to write the code, but it still costs time to iterate the dataset. For more info about tf.data.Dataset, check this link.

answered Oct 19 '22 21:10

Damon Roux

As of 4/15/2022 with the TF v2.8, you can get the results by using

dataset.cardinality().numpy()

ref: https://www.tensorflow.org/api_docs/python/tf/data/Dataset#cardinality

answered Oct 19 '22 20:10

Vincent Yuan

Related questions
                            
                                Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5103 (compatibility version 5100)
                            
                                How do I resolve these tensorflow warnings?
                            
                                module 'tensorflow' has no attribute 'GPUOptions'
                            
                                AttributeError: 'google.protobuf.pyext._message.RepeatedCompositeCo' object has no attribute 'append'
                            
                                tensorflow (not tensorflow-gpu): failed call to cuInit: UNKNOWN ERROR (303)
                            
                                Is tensorflow lazy?
                            
                                Is it possible to visualize keras embeddings in tensorboard?
                            
                                RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
                            
                                Is there a way to force Bazel to run tests serially
                            
                                TensorFlow concat a variable-sized placeholder with a vector
                            
                                How do I choose an optimizer for my tensorflow model?
                            
                                Can i finetune deeplab to a custom dataset in tensorflow?
                            
                                Keras LSTM: a time-series multi-step multi-features forecasting - poor results
                            
                                How to serve a tensorflow-module, specifically Universal Sentence Encoder?
                            
                                Tensorflow 2.0 dataset and dataloader
                            
                                Tensorflow: `batch_size` or `steps` is required for `Tensor` or `NumPy` input data
                            
                                Is it thread-safe when using tf.Session in inference service?
                            
                                How does tf.multinomial work?
                            
                                What is the difference between backpropagation and reverse-mode autodiff?
                            
                                How do you send arguments to a generator function using tf.data.Dataset.from_generator()?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to acquire tf.data.dataset's shape?

Tags:

machine-learning

tensorflow

deep-learning

tensorflow-datasets