I have a directory of images, and a separate file matching image filenames to labels. So the directory of images has files like 'train/001.jpg' and the labeling file looks like:
train/001.jpg 1
train/002.jpg 2
...
I can easily load images from the image directory in Tensor Flow by creating a filequeue from the filenames:
filequeue = tf.train.string_input_producer(filenames)
reader = tf.WholeFileReader()
img = reader.read(filequeue)
But I'm at a loss for how to couple these files with the labels from the labeling file. It seems I need access to the filenames inside the queue at each step. Is there a way to get them? Furthermore, once I have the filename, I need to be able to look up the label keyed by the filename. It seems like a standard Python dictionary wouldn't work because these computations need to happen at each step in the graph.
Given that your data is not too large for you to supply the list of filenames as a python array, I'd suggest just doing the preprocessing in Python. Create two lists (same order) of the filenames and the labels, and insert those into either a randomshufflequeue or a queue, and dequeue from that. If you want the "loops infinitely" behavior of the string_input_producer, you could re-run the 'enqueue' at the start of every epoch.
A very toy example:
import tensorflow as tf
f = ["f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8"]
l = ["l1", "l2", "l3", "l4", "l5", "l6", "l7", "l8"]
fv = tf.constant(f)
lv = tf.constant(l)
rsq = tf.RandomShuffleQueue(10, 0, [tf.string, tf.string], shapes=[[],[]])
do_enqueues = rsq.enqueue_many([fv, lv])
gotf, gotl = rsq.dequeue()
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
tf.train.start_queue_runners(sess=sess)
sess.run(do_enqueues)
for i in xrange(2):
one_f, one_l = sess.run([gotf, gotl])
print "F: ", one_f, "L: ", one_l
The key is that you're effectively enqueueing pairs of filenames/labels when you do the enqueue
, and those pairs are returned by the dequeue
.
Here's what I was able to do.
I first shuffled the filenames and matched the labels to them in Python:
np.random.shuffle(filenames)
labels = [label_dict[f] for f in filenames]
Then created a string_input_producer for the filenames with shuffle off, and a FIFO for labels:
lv = tf.constant(labels)
label_fifo = tf.FIFOQueue(len(filenames),tf.int32, shapes=[[]])
file_fifo = tf.train.string_input_producer(filenames, shuffle=False, capacity=len(filenames))
label_enqueue = label_fifo.enqueue_many([lv])
Then to read the image I could use a WholeFileReader and to get the label I could dequeue the fifo:
reader = tf.WholeFileReader()
image = tf.image.decode_jpeg(value, channels=3)
image.set_shape([128,128,3])
result.uint8image = image
result.label = label_fifo.dequeue()
And generate the batches as follows:
min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(num_examples_per_epoch *
min_fraction_of_examples_in_queue)
num_preprocess_threads = 16
images, label_batch = tf.train.shuffle_batch(
[result.uint8image, result.label],
batch_size=FLAGS.batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * FLAGS.batch_size,
min_after_dequeue=min_queue_examples)
There is tf.py_func()
you could utilize to implement a mapping from file path to label.
files = gfile.Glob(data_pattern)
filename_queue = tf.train.string_input_producer(
files, num_epochs=num_epochs, shuffle=True) # list of files to read
def extract_label(s):
# path to label logic for cat&dog dataset
return 0 if os.path.basename(str(s)).startswith('cat') else 1
def read(filename_queue):
key, value = reader.read(filename_queue)
image = tf.image.decode_jpeg(value, channels=3)
image = tf.cast(image, tf.float32)
image = tf.image.resize_image_with_crop_or_pad(image, width, height)
label = tf.cast(tf.py_func(extract_label, [key], tf.int64), tf.int32)
label = tf.reshape(label, [])
training_data = [read(filename_queue) for _ in range(num_readers)]
...
tf.train.shuffle_batch_join(training_data, ...)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With