I am trying to make use of queues for loading data from files in Tensorflow.
I would like to to run the graph with validation data at the end of each epoch to get a better feel for how the training is going.
That is where i am running into problems. I cant seem to figure out how to make the switch between training data and validation data when using queues.
I have stripped down my code to a bare minimum toy example to make it easier to get help. Instead of including all the code that loads the image files, performs inference, and training, I have chopped it off at the point where the filenames are loaded into the queue.
import tensorflow as tf
# DATA
train_items = ["train_file_{}".format(i) for i in range(6)]
valid_items = ["valid_file_{}".format(i) for i in range(3)]
# SETTINGS
batch_size = 3
batches_per_epoch = 2
epochs = 2
# CREATE GRAPH
graph = tf.Graph()
with graph.as_default():
file_list = tf.placeholder(dtype=tf.string, shape=None)
# Create a queue consisting of the strings in `file_list`
q = tf.train.string_input_producer(train_items, shuffle=False, num_epochs=None)
# Create batch of items.
x = q.dequeue_many(batch_size)
# Inference, train op, and accuracy calculation after this point
# ...
# RUN SESSION
with tf.Session(graph=graph) as sess:
# Initialize variables
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
# Start populating the queue.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
for epoch in range(epochs):
print("-"*60)
for step in range(batches_per_epoch):
if coord.should_stop():
break
train_batch = sess.run(x, feed_dict={file_list: train_items})
print("TRAIN_BATCH: {}".format(train_batch))
valid_batch = sess.run(x, feed_dict={file_list: valid_items})
print("\nVALID_BATCH : {} \n".format(valid_batch))
except Exception, e:
coord.request_stop(e)
finally:
coord.request_stop()
coord.join(threads)
num_epochs
If i set the num_epochs
argument in tf.train.string_input_producer()
to
None
it gives be the following output,
which shows that it is running two epochs as intended, but it is using data
from the training set when running evaluation.
------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
VALID_BATCH : ['train_file_0' 'train_file_1' 'train_file_2']
------------------------------------------------------------
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
VALID_BATCH : ['train_file_3' 'train_file_4' 'train_file_5']
If i set the num_epochs
argument in tf.train.string_input_producer()
to 2
it gives be the following output,
which shows that it is not even running the full two batches at all
(and evaliation is still using training data)
------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
VALID_BATCH : ['train_file_0' 'train_file_1' 'train_file_2']
------------------------------------------------------------
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
If i set the num_epochs
argument in tf.train.string_input_producer()
to 1
in the hopes that it will flush out
any aditional training data from the queue so it can make use of the validation
data, i get the following output, which shows that it is terminating as soon as
it gets through one epoch of training data, and does not get to go through
loading evaluation data.
------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
capacity
argument to various valuesI have also tried setting the capacity
argument in
tf.train.string_input_producer()
to small values, such as 3, and 1. But these
had no effect on the results.
What other approach could i take to switch between training and validation data? Would i have to create separate queues? I am at a loss as to how to get that to work. Would i have to create additional coordinators and queue runners as well?
I am compiling a list of potential approaches that might solve this issue here. Most of these are just vague suggestions, with no actual code examples to show how to make use of them.
Suggested here
Suggested here
Also suggested by sygi on this very stackoverflow thread. link
Suggested here
Suggested here and here
suggested by sygi in this very stackoverflow thread (link). This might be the same as make_template() method.
Suggested here with sample code here Code adapted to my problem here on this thread. link
Suggested here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With