With the recent upgrade to version 1.4, Tensorflow included <code>tf.data</code> in the library core. One "major new feature" described in the version 1.4 release notes is <code>tf.data.Dataset.apply()</code>, which is a "method for applying custom transformation functions". How is this different from the already existing <code>tf.data.Dataset.map()</code>?

The difference is that <code>map</code> will execute one function on every element of the <code>Dataset</code> separately, whereas <code>apply</code> will execute one function on the whole <code>Dataset</code> at once (such as <code>group_by_window</code> given as example in the documentation). The argument of <code>apply</code> is a function that takes a <code>Dataset</code> and returns a <code>Dataset</code> when the argument of <code>map</code> is a function that takes one element and returns one transformed element.

Sunreef's answer is absolutely correct. You might still be wondering why we introduced <code>Dataset.apply()</code>, and I thought I'd offer some background. The <code>tf.data</code> API has a set of core transformations—like <code>Dataset.map()</code> and <code>Dataset.filter()</code>—that are generally useful across a wide range of datasets, unlikely to change, and implemented as methods on the <code>tf.data.Dataset</code> object. In particular, they are subject to the same backwards compatibility guarantees as other core APIs in TensorFlow. However, the core approach is a bit restrictive. We also want the freedom to experiment with new transformations before adding them to the core, and to allow other library developers to create their own reusable transformations. Therefore, in TensorFlow 1.4 we split out a set of custom transformations that live in <code>tf.contrib.data</code>. The custom transformations include some that have very specific functionality (like <code>tf.contrib.data.sloppy_interleave()</code>), and some where the API is still in flux (like <code>tf.contrib.data.group_by_window()</code>). Originally we implemented these custom transformations as functions from <code>Dataset</code> to <code>Dataset</code>, which had an unfortunate effect on the syntactic flow of a pipeline. For example: <pre class="prettyprint"><code>dataset = tf.data.TFRecordDataset(...).map(...) # Method chaining breaks when we apply a custom transformation. dataset = custom_transformation(dataset, x, y, z) dataset = dataset.shuffle(...).repeat(...).batch(...) </code></pre> Since this seemed to be a common pattern, we added <code>Dataset.apply()</code> as a way to chain core and custom transformations in a single pipeline: <pre class="prettyprint"><code>dataset = (tf.data.TFRecordDataset(...) .map(...) .apply(custom_transformation(x, y, z)) .shuffle(...) .repeat(...) .batch(...)) </code></pre> It's a minor feature in the grand scheme of things, but hopefully it helps to make <code>tf.data</code> programs easier to read, and the library easier to extend.

Difference between tf.data.Dataset.map() and tf.data.Dataset.apply()

Tags:

python

tensorflow

tensorflow-datasets

With the recent upgrade to version 1.4, Tensorflow included tf.data in the library core. One "major new feature" described in the version 1.4 release notes is tf.data.Dataset.apply(), which is a "method for applying custom transformation functions". How is this different from the already existing tf.data.Dataset.map()?

378

asked Nov 03 '17 08:11

GPhilo

2 Answers

The difference is that map will execute one function on every element of the Dataset separately, whereas apply will execute one function on the whole Dataset at once (such as group_by_window given as example in the documentation).

The argument of apply is a function that takes a Dataset and returns a Dataset when the argument of map is a function that takes one element and returns one transformed element.

125

answered Sep 22 '22 07:09

Sunreef

Sunreef's answer is absolutely correct. You might still be wondering why we introduced Dataset.apply(), and I thought I'd offer some background.

The tf.data API has a set of core transformations—like Dataset.map() and Dataset.filter()—that are generally useful across a wide range of datasets, unlikely to change, and implemented as methods on the tf.data.Dataset object. In particular, they are subject to the same backwards compatibility guarantees as other core APIs in TensorFlow.

However, the core approach is a bit restrictive. We also want the freedom to experiment with new transformations before adding them to the core, and to allow other library developers to create their own reusable transformations. Therefore, in TensorFlow 1.4 we split out a set of custom transformations that live in tf.contrib.data. The custom transformations include some that have very specific functionality (like tf.contrib.data.sloppy_interleave()), and some where the API is still in flux (like tf.contrib.data.group_by_window()). Originally we implemented these custom transformations as functions from Dataset to Dataset, which had an unfortunate effect on the syntactic flow of a pipeline. For example:

dataset = tf.data.TFRecordDataset(...).map(...)  # Method chaining breaks when we apply a custom transformation. dataset = custom_transformation(dataset, x, y, z)  dataset = dataset.shuffle(...).repeat(...).batch(...)

Since this seemed to be a common pattern, we added Dataset.apply() as a way to chain core and custom transformations in a single pipeline:

dataset = (tf.data.TFRecordDataset(...)            .map(...)            .apply(custom_transformation(x, y, z))            .shuffle(...)            .repeat(...)            .batch(...))

It's a minor feature in the grand scheme of things, but hopefully it helps to make tf.data programs easier to read, and the library easier to extend.

answered Sep 24 '22 07:09

mrry

Related questions
                            
                                Tkinter button command activates upon running program?
                            
                                Django: request.GET and KeyError
                            
                                python numpy euclidean distance calculation between matrices of row vectors
                            
                                Mocking open(file_name) in unit tests [duplicate]
                            
                                Create a slice using a tuple
                            
                                Jquery and Django CSRF Token
                            
                                Alembic: alembic revision says Import Error
                            
                                threading.Timer()
                            
                                Turn off error bars in Seaborn Bar Plot
                            
                                Using anaconda environment in Atom
                            
                                Python extend with an empty list bug? [duplicate]
                            
                                In Python, how can I get the correctly-cased path for a file?
                            
                                No speed gains from Cython
                            
                                Flask-principal tutorial (auth + authr) [closed]
                            
                                How to change text/font color in reportlab.pdfgen
                            
                                Python Exception in thread Thread-1 (most likely raised during interpreter shutdown)?
                            
                                Install Anaconda on Ubuntu (or Linux) via command line
                            
                                Most efficient method to check if dictionary key exists and process its value if it does
                            
                                Summing list of counters in python
                            
                                How to get OR permissions instead of AND in REST framework

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With