Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I profile a tf.data.Dataset?

I'm trying to understand what bottlenecks I have in my input_fn with tf.data.Dataset so I figured I'd use tf.profiler but it only shows the iterator op. How can I get the profiler to output the relevant ops in my Dataset pipeline instead?

Example

dataset = input_fn()
iterator = dataset.make_one_shot_iterator()
minibatch = iterator.get_next()
run_metadata = tf.RunMetadata()
with tf.Session() as session:
    features, labels = session.run(minibatch, 
                                   options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE),
                                   run_metadata=run_metadata)

tf.profiler.advise(tf.get_default_graph(), run_metadata)

Output:

checkers {
  key: "AcceleratorUtilizationChecker"
  value {
  }
}
checkers {
  key: "ExpensiveOperationChecker"
  value {
    reports: "top 1 operation type: IteratorGetNext, cpu: 79.89sec, accelerator: 0us, total: 79.89sec (99.96%)\ntop 2 operation type: OneShotIterator, cpu: 27.92ms, accelerator: 0us, total: 27.92ms (0.03%)\ntop 3 operation type: _retval_IteratorGetNext_3_3, cpu: 57us, accelerator: 0us, total: 57us (0.00%)"
    reports: "top 1 graph node: IteratorGetNext, cpu: 79.89sec, accelerator: 0us, total: 79.89sec\ntop 2 graph node: OneShotIterator, cpu: 27.92ms, accelerator: 0us, total: 27.92ms"
    reports: "<ipython-input-2-c5f67ba0356f>:49:<module>, cpu: 79.89sec, accelerator: 0us, total: 79.89sec\n<ipython-input-2-c5f67ba0356f>:48:<module>, cpu: 27.92ms, accelerator: 0us, total: 27.92ms"
  }
}
checkers {
  key: "OperationChecker"
  value {
  }
}
like image 841
Carl Thomé Avatar asked Jan 19 '18 20:01

Carl Thomé


People also ask

How do I create a dataset in TF?

There are two distinct ways to create a dataset: A data source constructs a Dataset from data stored in memory or in one or more files. A data transformation constructs a dataset from one or more tf.data.Dataset objects. To create an input pipeline, you must start with a data source.

What is TF dataset in TensorFlow?

Dataset. Tensorflow’s repeat () function can be used for duplicating data. The method repeats (tf. As with datasets, Dataset is a set of Tensors that repeats a value of tensor when given in several instances. What Is A Tf Dataset?

What is TF Data API?

The tf.data API enables you to build complex input pipelines from simple, reusable pieces. For example, the pipeline for an image model might aggregate data from files in a distributed file system, apply random perturbations to each image, and merge randomly selected images into a batch for training.

How to create a dataset from input data?

If all of your input data fits in memory, the simplest way to create a Dataset from them is to convert them to tf.Tensor objects and use Dataset.from_tensor_slices ().


1 Answers

Looks like tf.data profiling wasn't implemented. It seems to be added in version 1.14. This snippet:

import tensorflow as tf

dataset = tf.data.Dataset.range(100)
dataset = dataset.shuffle(30)
dataset = dataset.repeat()

iterator = dataset.make_one_shot_iterator()
minibatch = iterator.get_next()
run_metadata = tf.RunMetadata()
options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
with tf.Session() as session:
    session.run(minibatch, options=options, run_metadata=run_metadata)

tf.profiler.advise(tf.get_default_graph(), run_metadata)

Outputs:

Parsing Inputs...

ExpensiveOperationChecker:
top 1 operation type: OneShotIterator, cpu: 3.01ms, accelerator: 0us, total: 3.01ms (87.19%)
top 2 operation type: IteratorGetNext, cpu: 440us, accelerator: 0us, total: 440us (12.75%)
top 3 operation type: _retval_IteratorGetNext_0_0, cpu: 2us, accelerator: 0us, total: 2us (0.06%)
top 1 graph node: OneShotIterator, cpu: 3.01ms, accelerator: 0us, total: 3.01ms
top 2 graph node: IteratorGetNext, cpu: 440us, accelerator: 0us, total: 440us
test.py:7:<module>, cpu: 3.01ms, accelerator: 0us, total: 3.01ms

OperationChecker:

AcceleratorUtilizationChecker:
like image 146
McAngus Avatar answered Sep 24 '22 07:09

McAngus