With the recent upgrade to version 1.4, Tensorflow included tf.data
in the library core. One "major new feature" described in the version 1.4 release notes is tf.data.Dataset.apply()
, which is a "method for applying custom transformation functions". How is this different from the already existing tf.data.Dataset.map()
?
public final class PrefetchDataset. Creates a dataset that asynchronously prefetches elements from `input_dataset`.
An overview of tf. data. The Dataset API allows you to build an asynchronous, highly optimized data pipeline to prevent your GPU from data starvation. It loads data from the disk (images or text), applies optimized transformations, creates batches and sends it to the GPU.
from_tensor_slices(tensor) creates a Dataset whose elements are slices of the given tensors.
The difference is that map
will execute one function on every element of the Dataset
separately, whereas apply
will execute one function on the whole Dataset
at once (such as group_by_window
given as example in the documentation).
The argument of apply
is a function that takes a Dataset
and returns a Dataset
when the argument of map
is a function that takes one element and returns one transformed element.
Sunreef's answer is absolutely correct. You might still be wondering why we introduced Dataset.apply()
, and I thought I'd offer some background.
The tf.data
API has a set of core transformations—like Dataset.map()
and Dataset.filter()
—that are generally useful across a wide range of datasets, unlikely to change, and implemented as methods on the tf.data.Dataset
object. In particular, they are subject to the same backwards compatibility guarantees as other core APIs in TensorFlow.
However, the core approach is a bit restrictive. We also want the freedom to experiment with new transformations before adding them to the core, and to allow other library developers to create their own reusable transformations. Therefore, in TensorFlow 1.4 we split out a set of custom transformations that live in tf.contrib.data
. The custom transformations include some that have very specific functionality (like tf.contrib.data.sloppy_interleave()
), and some where the API is still in flux (like tf.contrib.data.group_by_window()
). Originally we implemented these custom transformations as functions from Dataset
to Dataset
, which had an unfortunate effect on the syntactic flow of a pipeline. For example:
dataset = tf.data.TFRecordDataset(...).map(...) # Method chaining breaks when we apply a custom transformation. dataset = custom_transformation(dataset, x, y, z) dataset = dataset.shuffle(...).repeat(...).batch(...)
Since this seemed to be a common pattern, we added Dataset.apply()
as a way to chain core and custom transformations in a single pipeline:
dataset = (tf.data.TFRecordDataset(...) .map(...) .apply(custom_transformation(x, y, z)) .shuffle(...) .repeat(...) .batch(...))
It's a minor feature in the grand scheme of things, but hopefully it helps to make tf.data
programs easier to read, and the library easier to extend.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With