Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Attach a queue to a numpy array in tensorflow for data fetch instead of files?

I have read the CNN Tutorial on the TensorFlow and I am trying to use the same model for my project. The problem is now in data reading. I have around 25000 images for training and around 5000 for testing and validation each. The files are in png format and I can read them and convert them into the numpy.ndarray.

The CNN example in the tutorials use a queue to fetch the records from the file list provided. I tried to create my own such binary file by reshaping my images into 1-D array and attaching a label value in the front of it. So my data looks like this

[[1,12,34,24,53,...,105,234,102],
 [12,112,43,24,52,...,115,244,98],
....
]

The single row of the above array is of length 22501 size where the first element is the label.

I dumped the file to using pickle and the tried to read from the file using the tf.FixedLengthRecordReader to read from the file as demonstrated in example

I am doing the same things as given in the cifar10_input.py to read the binary file and putting them into the record object.

Now when I read from the files the labels and the image values are different. I can understand the reason for this to be that pickle dumps the extra information of braces and brackets also in the binary file and they change the fixed length record size.

The above example uses the filenames and pass it to a queue to fetch the files and then the queue to read a single record from the file.

I want to know if I can pass the numpy array as defined above instead of the filenames to some reader and it can fetch records one by one from that array instead of the files.

like image 769
t0mkaka Avatar asked Jan 06 '16 11:01

t0mkaka


1 Answers

In your question, you specifically asked:

I want to know if I can pass the numpy array as defined above instead of the filenames to some reader and it can fetch records one by one from that array instead of the files.

You can feed the numpy array to a queue directly, but it will be a more invasive change to the cifar10_input.py code than my other answer suggests.

As before, let's assume you have the following array from your question:

import numpy as np
images_and_labels_array = np.array([[...], ...],  # [[1,12,34,24,53,...,102],
                                                  #  [12,112,43,24,52,...,98],
                                                  #  ...]
                                   dtype=np.uint8)

You can then define a queue that contains the entire data as follows:

q = tf.FIFOQueue([tf.uint8, tf.uint8], shapes=[[], [22500]])
enqueue_op = q.enqueue_many([image_and_labels_array[:, 0], image_and_labels_array[:, 1:]])

...then call sess.run(enqueue_op) to populate the queue.


Another—more efficient—approach would be to feed records to the queue, which you could do from a parallel thread (see this answer for more details on how this would work):

# [With q as defined above.]
label_input = tf.placeholder(tf.uint8, shape=[])
image_input = tf.placeholder(tf.uint8, shape=[22500])

enqueue_single_from_feed_op = q.enqueue([label_input, image_input])

# Then, to enqueue a single example `i` from the array.
sess.run(enqueue_single_from_feed_op,
         feed_dict={label_input: image_and_labels_array[i, 0],
                    image_input: image_and_labels_array[i, 1:]})

Alternatively, to enqueue a batch at a time, which will be more efficient:

label_batch_input = tf.placeholder(tf.uint8, shape=[None])
image_batch_input = tf.placeholder(tf.uint8, shape=[None, 22500])

enqueue_batch_from_feed_op = q.enqueue([label_batch_input, image_batch_input])

# Then, to enqueue a batch examples `i` through `j-1` from the array.
sess.run(enqueue_single_from_feed_op,
         feed_dict={label_input: image_and_labels_array[i:j, 0],
                    image_input: image_and_labels_array[i:j, 1:]})
like image 50
mrry Avatar answered Nov 05 '22 11:11

mrry