I have set up a CNN in Tensorflow where I read my data with a TFRecordReader. It works well but I would like to do some more preprocessing and data augmentation than offered by the tf.image
functions. I would specifically like to do some randomized scaling.
Is it possible to process a Tensorflow tensor in Numpy? Or do I need to drop the TFRecordReader and rather do all my preprocessing in Numpy and feed data using the feed_dict? I suspect that the feed_dict method is slow when training on images, but I might be wrong?
If you could create a custom I/O pipeline that fetches intermediate results back from TensorFlow using one or more threads, applies arbitrary Python logic, and then feeds them into a queue for subsequent processing. The resulting program would be somewhat more complicated, but I suggest you look at the threading and queues HOWTO for information on how to get started.
There is an experimental feature that might make this easier, if you install from source.
If you have already built a preprocessing pipeline using TensorFlow ops, the easiest way to add some custom Python code is to use the tf.py_func()
operator, which takes a list of Tensor
objects, and a Python function that maps one or more NumPy arrays to one or more NumPy arrays.
For example, let's say you have a pipeline like this:
reader = tf.TFRecordReader(...)
image_t = tf.image.decode_png(tf.parse_single_example(reader.read(), ...))
...you could use tf.py_func()
to apply some custom NumPy processing as follows:
from scipy import ndimage
def preprocess(array):
# `array` is a NumPy array containing.
return ndimage.rotate(array, 45)
image_t = tf.py_func(preprocess, [image_t], [tf.float32])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With