There are at least two more questions like this on SO but not a single one has been answered.
I have a dataset of the form:
<TensorSliceDataset shapes: ((512,), (512,), (512,), ()), types: (tf.int32, tf.int32, tf.int32, tf.int32)>
and another of the form:
<BatchDataset shapes: ((None, 512), (None, 512), (None, 512), (None,)), types: (tf.int32, tf.int32, tf.int32, tf.int32)>
I have looked and looked but I can't find the code to save these datasets to files that can be loaded later. The closest I got was this page in the TensorFlow docs, which suggests serializing the tensors using tf.io.serialize_tensor
and then writing them to a file using tf.data.experimental.TFRecordWriter
.
However, when I tried this using the code:
dataset.map(tf.io.serialize_tensor)
writer = tf.data.experimental.TFRecordWriter('mydata.tfrecord')
writer.write(dataset)
I get an error on the first line:
TypeError: serialize_tensor() takes from 1 to 2 positional arguments but 4 were given
How can I modify the above (or do something else) to accomplish my goal?
One way would be to do a. numpy(). save('file. npy') then converting back to a tensor after loading.
Normally when you use TensorFlow Datasets, the downloaded and prepared data will be cached in a local directory (by default ~/tensorflow_datasets ).
An incident was open on GitHUb and it appears there's a new feature available in TF 2.3 to write to disk :
https://www.tensorflow.org/api_docs/python/tf/data/experimental/save https://www.tensorflow.org/api_docs/python/tf/data/experimental/load
I haven't tested this features yet but it seems to be doing what you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With