I am a pytorch user, and I am used to the data.dataset and data.dataloader api in pytorch. I am trying to build a same model with tensorflow 2.0, and I wonder whether there is an api that works similarly with these api in pytorch.
If there is no such api, can any of you tell me how people usually do to implement the data loading part in tensorflow ? I've used tensorflow 1, but never had an experience with dataset api. I've hard coded before. I hope there is something like overriding getitem with only index as an input.
Thanks much in advance.
Tensorflow uses multiple threads to load the data in memory and its dataloaders can prefetch the data before-hand so that your training loop doesn't get blocked while loading the data.
Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset to enable easy access to the samples.
When using the tf.data
API, you will usually also make use of the map
function.
In PyTorch, your __getItem__
call basically fetches an element from your data structure given in __init__
and transforms it if necessary.
In TF2.0, you do the same by initializing a Dataset
using one of the Dataset.from_...
functions (see from_generator
, from_tensor_slices
, from_tensors
); this is essentially the __init__
part of a PyTorch Dataset
. Then, you can call map
to do the element-wise manipulations you would have in __getItem__
.
Tensorflow datasets are pretty much fancy iterators, so by design you don't access their elements using indices, but rather by traversing them.
The guide on tf.data
is very useful and provides a wide variety of examples.
I am not familiar with Pytorch but Tensorflow implements the Keras API which has the Sequence class that is:
Base object for fitting to a sequence of data, such as a dataset
https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence
This class contains getitem for an index.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With