Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TensorFlow Dataset `.map` - Is it possible to ignore errors?

Short version:

When using Dataset map operations, is it possible to specify that any 'rows' where the map invocation results in an error are quietly filtered out rather than having the error bubble up and kill the whole session?

Specifics:

I have an input pipeline set up that (more or less) does the following:

  1. reads a set of file paths of images stored locally (images of varying dimensions)
  2. reads a suggested set of 'bounding boxes' from a csv
  3. Produces the set of all image path to bounding box combinations
  4. Reads and decodes the image then produces the set of 'cropped' images for each of these combinations using tf.image.crop_to_bounding_box

My issue is that there are (very rare) instances where my suggested bounding boxes are outside the bounds of a given image so (understandably) tf.image.crop_to_bounding_box throws an error something like this:

tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [width must be >= target + offset.]

which kills the session.

I'd prefer it if these errors were simply ignored and that the pipeline moved onto the next combination.

(I understand that the correct fix for this specific issue would be commit the time to checking each bounding box and image dimension size are possible the step before and filter them out using a filter operation before it got to the map with the cropping operation. I was wondering if there was an easy way to just ignore an error and move on to the next case both for easy of implementation in this specific case and also in more general cases)

like image 725
Stewart_R Avatar asked Aug 22 '18 12:08

Stewart_R


People also ask

What does TF data dataset from_tensor_slices do?

Dataset. from_tensor_slices() method, we can get the slices of an array in the form of objects by using tf. data.

What is a prefetch dataset?

Dataset. prefetch transformation. It can be used to decouple the time when data is produced from the time when data is consumed.

How do I iterate over a TensorFlow dataset?

To iterate over the dataset several times, use . repeat() . We can enumerate each batch by using either Python's enumerator or a build-in method.

What is buffer size in TensorFlow?

For perfect shuffling, set the buffer size equal to the full size of the dataset. For instance, if your dataset contains 10,000 elements but buffer_size is set to 1,000, then shuffle will initially select a random element from only the first 1,000 elements in the buffer.


2 Answers

For Tensorflow 2

dataset = dataset.apply(tf.data.experimental.ignore_errors())
like image 187
fsan Avatar answered Nov 15 '22 08:11

fsan


There is tf.contrib.data.ignore_errors. I've never tried this myself, but according to the docs the usage is simply

dataset = dataset.map(some_map_function)
dataset = dataset.apply(tf.contrib.data.ignore_errors())

It should simply pass through the inputs (i.e. returns the same dataset) but ignore any that throw an error.

like image 28
xdurch0 Avatar answered Nov 15 '22 09:11

xdurch0