IDE breakpoint in TensorFlow Dataset API mapped py_function?

Tags:

I'm using the Tensorflow Dataset API to prepare my data for input into my network. During this process, I have some custom Python functions which are mapped to the dataset using tf.py_function. I want to be able to debug the data going into these functions and what happens to that data inside these functions. When a py_function is called, this calls back to the main Python process (according to this answer). Since this function is in Python, and in the main process, I would expect a regular IDE breakpoint to be able stop in this process. However, this doesn't seem to be the case (example below where the breakpoint does not halt execution). Is there a way to drop into a breakpoint within a py_function used by the Dataset map?

Example where the breakpoint does not halt execution

Click to copy

import tensorflow as tf

def add_ten(example, label):
    example_plus_ten = example + 10  # Breakpoint here.
    return example_plus_ten, label

examples = [10, 20, 30, 40, 50, 60, 70, 80]
labels =   [ 0,  0,  1,  1,  1,  1,  0,  0]

examples_dataset = tf.data.Dataset.from_tensor_slices(examples)
labels_dataset = tf.data.Dataset.from_tensor_slices(labels)
dataset = tf.data.Dataset.zip((examples_dataset, labels_dataset))
dataset = dataset.map(map_func=lambda example, label: tf.py_function(func=add_ten, inp=[example, label],
                                                                     Tout=[tf.int32, tf.int32]))
dataset = dataset.batch(2)
example_and_label = next(iter(dataset))

366

asked Dec 10 '19 20:12

golmschenk

1 Answers

Tensorflow 2.0 implementation of tf.data.Dataset opens a C threads for each call without notifying your debugger. Use pydevd's to manually set a tracing function that will connect to your default debugger server and start feeding it the debug data.

Click to copy

import pydevd
pydevd.settrace()

Example with your code:

Click to copy

import tensorflow as tf
import pydevd

def add_ten(example, label):
    pydevd.settrace(suspend=False)
    example_plus_ten = example + 10  # Breakpoint here.
    return example_plus_ten, label

examples = [10, 20, 30, 40, 50, 60, 70, 80]
labels =   [ 0,  0,  1,  1,  1,  1,  0,  0]

examples_dataset = tf.data.Dataset.from_tensor_slices(examples)
labels_dataset = tf.data.Dataset.from_tensor_slices(labels)
dataset = tf.data.Dataset.zip((examples_dataset, labels_dataset))
dataset = dataset.map(map_func=lambda example, label: tf.py_function(func=add_ten, inp=[example, label],
                                                                     Tout=[tf.int32, tf.int32]))
dataset = dataset.batch(2)
example_and_label = next(iter(dataset))

Note: If you are using IDE which already bundles pydevd (such as PyDev or PyCharm) you do not have to install pydevd separately, it will picked up during the debug session.

143

answered Oct 21 '22 19:10

Daniel Braun

Related questions
                            
                                How to set current_user for pytest?
                            
                                This site can’t be reached [flask, python]
                            
                                Testing that a method in instance has been called in mock
                            
                                keep both merging keys after pandas.merge_asof
                            
                                How to deal with: ImportError: /usr/lib/x86_64-linux-gnu/libatk-1.0.so.0: undefined symbol: g_log_structured_standard
                            
                                How do I run background job in Flask without threading or task-queue
                            
                                Python: Range or numpy Arange with end limit include
                            
                                Highlight a column value based off another column value in pandas
                            
                                How to package a shell script in a pip package
                            
                                Validate in merge function pandas
                            
                                Is there a way to disable saving to checkpoints for Jupyter Notebooks?
                            
                                Get previous business day in a DataFrame
                            
                                could not use tqdm_notebook in notebook
                            
                                Does mypy have a Subclass-Acceptable Return Type?
                            
                                Coverage badge in Gitlab CI with Python coverage always unknown
                            
                                Why doesn't pandas reindex() operate in-place?
                            
                                Plotly express is not rendered in jupyter lab
                            
                                Should setuptools be in the setup_requires entry of setup.cfg files?
                            
                                Executing a script that is loading libcrypto in an unsafe way on macOS 10.15.1
                            
                                Slow pandas DataFrame MultiIndex reindex

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

IDE breakpoint in TensorFlow Dataset API mapped py_function?

Tags:

python

tensorflow

tensorflow2.0

tensorflow-datasets

golmschenk

People also ask

1 Answers

Daniel Braun

Recent Activity

Donate For Us