Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow Queues - Switching between train and validation data

I am trying to make use of queues for loading data from files in Tensorflow.

I would like to to run the graph with validation data at the end of each epoch to get a better feel for how the training is going.

That is where i am running into problems. I cant seem to figure out how to make the switch between training data and validation data when using queues.

I have stripped down my code to a bare minimum toy example to make it easier to get help. Instead of including all the code that loads the image files, performs inference, and training, I have chopped it off at the point where the filenames are loaded into the queue.

import tensorflow as tf

#  DATA
train_items = ["train_file_{}".format(i) for i in range(6)]
valid_items = ["valid_file_{}".format(i) for i in range(3)]

# SETTINGS
batch_size = 3
batches_per_epoch = 2
epochs = 2

# CREATE GRAPH
graph = tf.Graph()
with graph.as_default():
    file_list = tf.placeholder(dtype=tf.string, shape=None)
    
    # Create a queue consisting of the strings in `file_list`
    q = tf.train.string_input_producer(train_items, shuffle=False, num_epochs=None)
    
    # Create batch of items.
    x = q.dequeue_many(batch_size)
    
    # Inference, train op, and accuracy calculation after this point
    # ...


# RUN SESSION
with tf.Session(graph=graph) as sess:
    # Initialize variables
    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())
    
    # Start populating the queue.
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    
    try:
        for epoch in range(epochs):
            print("-"*60)
            for step in range(batches_per_epoch):
                if coord.should_stop():
                    break
                train_batch = sess.run(x, feed_dict={file_list: train_items})
                print("TRAIN_BATCH: {}".format(train_batch))
    
            valid_batch = sess.run(x, feed_dict={file_list: valid_items})
            print("\nVALID_BATCH : {} \n".format(valid_batch))
    
    except Exception, e:
        coord.request_stop(e)
    finally:
        coord.request_stop()
        coord.join(threads)

Variations and experiments

Trying different values for num_epochs

num_epochs=None

If i set the num_epochs argument in tf.train.string_input_producer()to None it gives be the following output, which shows that it is running two epochs as intended, but it is using data from the training set when running evaluation.

------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']

VALID_BATCH : ['train_file_0' 'train_file_1' 'train_file_2']

------------------------------------------------------------
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']

VALID_BATCH : ['train_file_3' 'train_file_4' 'train_file_5']

num_epochs=2

If i set the num_epochs argument in tf.train.string_input_producer() to 2 it gives be the following output, which shows that it is not even running the full two batches at all (and evaliation is still using training data)

------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']

VALID_BATCH : ['train_file_0' 'train_file_1' 'train_file_2']

------------------------------------------------------------
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']

num_epochs=1

If i set the num_epochs argument in tf.train.string_input_producer() to 1 in the hopes that it will flush out any aditional training data from the queue so it can make use of the validation data, i get the following output, which shows that it is terminating as soon as it gets through one epoch of training data, and does not get to go through loading evaluation data.

------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']

Setting capacity argument to various values

I have also tried setting the capacity argument in tf.train.string_input_producer() to small values, such as 3, and 1. But these had no effect on the results.

What other approach should I take?

What other approach could i take to switch between training and validation data? Would i have to create separate queues? I am at a loss as to how to get that to work. Would i have to create additional coordinators and queue runners as well?

like image 789
ronrest Avatar asked Dec 15 '16 11:12

ronrest


1 Answers

I am compiling a list of potential approaches that might solve this issue here. Most of these are just vague suggestions, with no actual code examples to show how to make use of them.

Placeholder with default

Suggested here

Using tf.cond()

Suggested here

Also suggested by sygi on this very stackoverflow thread. link

using tf.group() and tf.cond()

Suggested here

make_template() method

Suggested here and here

Shared weights method

suggested by sygi in this very stackoverflow thread (link). This might be the same as make_template() method.

QueueBase() Method.

Suggested here with sample code here Code adapted to my problem here on this thread. link

training bucket method

Suggested here

like image 164
ronrest Avatar answered Oct 12 '22 01:10

ronrest