Tensorflow Dataset .map() API

Tags:

Couple of questions about this

For occasions when I'd like to do something like the following in Tensorflow (assume I'm creating training examples by loading WAV files):

Click to copy

import tensorflow as tf 

def _some_audio_preprocessing_func(filename):
   # ... some logic here which mostly uses Tensorflow ops ...
   with tf.Session(graph=tf.Graph()) as sess:
        wav_filename_placeholder = tf.placeholder(tf.string, [])
        wav_loader = io_ops.read_file(wav_filename_placeholder)
        wav_decoder = contrib_audio.decode_wav(wav_loader, desired_channels=1)
        data = sess.run(
                [wav_decoder],
                feed_dict={wav_filename_placeholder: filename})
        return data

dataset = tf.data.Dataset.list_files('*.wav')
dataset = dataset.map(_some_preprocessing_func)

If I have a parse_image() function that uses tensor ops - should this be part of the main Graph? Following the example set in Google's own audio TF tutorial, it looks like they create a separate graph! Doesn't this ruin the point of using Tensorflow to make things faster?
Do I use tf.py_func() any time any single line isn't from the tensorflow library? Again, I wonder what the performance implications are and when I should use this...

Thanks!

488

asked Mar 14 '18 05:03

lollercoaster

1 Answers

When you use Dataset.map(map_func), TensorFlow defines a subgraph for all the ops created in the function map_func, and arranges to execute it efficiently in the same session as the rest of your graph. There is almost never any need to create a tf.Graph or tf.Session inside map_func: if your parsing function is made up of TensorFlow ops, these ops can be embedded directly in the graph that defines the input pipeline.

The modified version of the code using tf.data would look like this:

Click to copy

import tensorflow as tf 
from tensorflow.contrib.framework.python.ops import audio_ops as contrib_audio

def _some_audio_preprocessing_func(filename):
    wav_loader = tf.read_file(filename)
    return contrib_audio.decode_wav(wav_loader, desired_channels=1)

dataset = tf.data.Dataset.list_files('*.wav')
dataset = dataset.map(_some_preprocessing_func)

If your map_func contains non-TensorFlow operations that you want to apply to each element, you should wrap them in a tf.py_func() (or Dataset.from_generator(), if the data generation process is defined in Python logic). The main performance implication is that any code running in a tf.py_func() is subject to the Global Interpreter Lock, so I would generally recommend trying to find a native TensorFlow implementation for anything that is performance critical.

answered Sep 19 '22 01:09

mrry

Related questions
                            
                                Saving plot with high resolution image [duplicate]
                            
                                in python, how to connect points with smooth line in plotting?
                            
                                Finding ONLY Unique Coordinates in List
                            
                                how to load json file greater than 10gb in pandas/python of a particular pattern
                            
                                How do I pass a string as an argument name?
                            
                                Python multiple inheritance is not showing class variables or method of second inherited base class
                            
                                Running dozens of Scrapy spiders in a controlled manner
                            
                                db.create_all() not creating tables in Flask-SQLAclchemy
                            
                                zipimport.ZipImportError: can't decompress data; zlib not available
                            
                                In Tensorflow for serving a model, what does the serving input function supposed to do exactly
                            
                                Convert index number to int (Python)
                            
                                How to generate an html report using pylint 1.8.2 to publish in gitlab-ci pages?
                            
                                Python Pandas - Merge based on substring in string
                            
                                decode(unicode_escape) in python 3 a string
                            
                                Jinja for loop scope is reset when incrementing variable
                            
                                Fill matplotlib subplots by column, not row
                            
                                Return a struct from C to Python using Cython
                            
                                Is there a t test table in python (numpy, scipy etc)?
                            
                                Pandas - insert a dataframe to MongoDB
                            
                                Error with pip install git (after switching to python 3.6)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Tensorflow Dataset .map() API

Tags:

python

tensorflow

tensorflow-datasets

lollercoaster

People also ask

1 Answers

mrry

Recent Activity

Donate For Us