Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Preprocess a Tensorflow tensor in Numpy

I have set up a CNN in Tensorflow where I read my data with a TFRecordReader. It works well but I would like to do some more preprocessing and data augmentation than offered by the tf.image functions. I would specifically like to do some randomized scaling.

Is it possible to process a Tensorflow tensor in Numpy? Or do I need to drop the TFRecordReader and rather do all my preprocessing in Numpy and feed data using the feed_dict? I suspect that the feed_dict method is slow when training on images, but I might be wrong?

like image 340
burk Avatar asked Jan 22 '16 09:01

burk


1 Answers

If you could create a custom I/O pipeline that fetches intermediate results back from TensorFlow using one or more threads, applies arbitrary Python logic, and then feeds them into a queue for subsequent processing. The resulting program would be somewhat more complicated, but I suggest you look at the threading and queues HOWTO for information on how to get started.


There is an experimental feature that might make this easier, if you install from source.

If you have already built a preprocessing pipeline using TensorFlow ops, the easiest way to add some custom Python code is to use the tf.py_func() operator, which takes a list of Tensor objects, and a Python function that maps one or more NumPy arrays to one or more NumPy arrays.

For example, let's say you have a pipeline like this:

reader = tf.TFRecordReader(...)
image_t = tf.image.decode_png(tf.parse_single_example(reader.read(), ...))

...you could use tf.py_func() to apply some custom NumPy processing as follows:

from scipy import ndimage
def preprocess(array):
  # `array` is a NumPy array containing.
  return ndimage.rotate(array, 45)

image_t = tf.py_func(preprocess, [image_t], [tf.float32])
like image 68
mrry Avatar answered Nov 26 '22 01:11

mrry