Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TensorFlow: How to skip broken data

Tags:

tensorflow

I am playing with TensorFlow 1.0. My input data are bulk of jpeg images. Some of them are broken for different reasons, and I just want to skip them at input.

Image loading part of the Graph is the following:

filename_queue = tf.train.string_input_producer(tf.train.match_filenames_once(filename_list), capacity=1000, num_epochs=1)
whole_file_reader = tf.WholeFileReader()
_, image_binary = whole_file_reader.read(filename_queue)
image_tensor = tf.cast(tf.image.decode_jpeg(image_binary), tf.float32)

Model running part as usual:

with sv.managed_session() as sess:
        sess.run(init_local)
        sess.run(init_all)

        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coord, sess=sess)

        try:
                while not coord.should_stop() and not sv.should_stop():
                        sess.run(accumulator)
        except tf.errors.OutOfRangeError:
                print('Done training -- epoch limit reached')
                #
        except Exception as e:
                # Report exceptions to the coordinator.
                coord.request_stop(e)
        finally:
                coord.request_stop()

        coord.request_stop()
        coord.join(threads)

When running this code I see the following, and I could not figure out how to catch this exception correctly.

Traceback (most recent call last):
  File "/home/matwey/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1022, in _do_call
    return fn(*args)
  File "/home/matwey/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
    status, run_metadata)
  File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/home/matwey/venv/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid JPEG data, size 0
         [[Node: DecodeJpeg = DecodeJpeg[acceptable_fraction=1, channels=0, dct_method="", fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](ReaderReadV2:1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "calculate_mean.py", line 67, in <module>
    coord.join(threads)
  File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/home/matwey/venv/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 973, in managed_session
    self.stop(close_summary_writer=close_summary_writer)
  File "/home/matwey/venv/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 801, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "/home/matwey/venv/lib/python3.5/site-packages/tensorflow/python/training/coordinator.py", line 386, in join
    six.reraise(*self._exc_info_to_raise)
  File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise
    raise value
  File "/home/matwey/venv/lib/python3.5/site-packages/tensorflow/python/training/queue_runner_impl.py", line 234, in _run
    sess.run(enqueue_op)
  File "/home/matwey/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/home/matwey/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/home/matwey/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/home/matwey/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid JPEG data, size 0
         [[Node: DecodeJpeg = DecodeJpeg[acceptable_fraction=1, channels=0, dct_method="", fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](ReaderReadV2:1)]]

Caused by op 'DecodeJpeg', defined at:
  File "calculate_mean.py", line 19, in <module>
    image_tensor = tf.cast(tf.image.decode_jpeg(image_binary), tf.float32)
  File "/home/matwey/venv/lib/python3.5/site-packages/tensorflow/python/ops/gen_image_ops.py", line 345, in decode_jpeg
    dct_method=dct_method, name=name)
  File "/home/matwey/venv/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/home/matwey/venv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/matwey/venv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Invalid JPEG data, size 0
         [[Node: DecodeJpeg = DecodeJpeg[acceptable_fraction=1, channels=0, dct_method="", fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](ReaderReadV2:1)]]

Unfortunately, an answer given in Skipping nonexistent or corrupt files in Tensorflow doesn't work for me. It seems that in my case an exception is raised by coord.join(threads) which is too late.

like image 950
0x2207 Avatar asked Oct 18 '22 16:10

0x2207


1 Answers

Sorry for the late response. The answer may be contained in your error message:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid JPEG data, size 0 [[Node: DecodeJpeg = DecodeJpegacceptable_fraction=1, channels=0, dct_method="", fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device="/job:localhost/replica:0/task:0/cpu:0"]]

For whatever reason a JPEG file may be corrupted. However, you've used the default settings for tf.image_decode_jpeg which requires perfect decoding. Instead you may want to allow some error by setting parameters try_recover_truncated = True and acceptable_fraction=0.5 (or whatever). See this link for more.

like image 130
RobR Avatar answered Oct 21 '22 08:10

RobR