Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tensorflow: efficient feeding of eval/train data using queue runners

I'm trying to run a tensorflow graph to train a model and periodically evaluate using a separate evaluation dataset. Both training and evaluation data is implemented using queue runners.

My current solution is to create both inputs in the same graph and use a tf.cond dependent on an is_training placeholder. My issue is highlighted by the following code:

import tensorflow as tf
from tensorflow.models.image.cifar10 import cifar10
from time import time


def get_train_inputs(is_training):
    return cifar10.inputs(False)


def get_eval_inputs(is_training):
    return cifar10.inputs(True)


def get_mixed_inputs(is_training):
    train_inputs = get_train_inputs(None)
    eval_inputs = get_eval_inputs(None)

    return tf.cond(is_training, lambda: train_inputs, lambda: eval_inputs)


def time_inputs(inputs_fn, n_runs=10):
    graph = tf.Graph()
    with graph.as_default():
        is_training = tf.placeholder(dtype=tf.bool, shape=(),
                                     name='is_training')
        images, labels = inputs_fn(is_training)

    with tf.Session(graph=graph) as sess:
        coordinator = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(sess=sess, coord=coordinator)
        t = time()
        for i in range(n_runs):
            im, l = sess.run([images, labels], feed_dict={is_training: True})
        dt = time() - t
        coordinator.request_stop()
        coordinator.join(threads)

    return dt / n_runs

print('Train inputs: %.3f' % time_inputs(get_train_inputs))
print('Eval inputs: %.3f' % time_inputs(get_eval_inputs))
print('Mixed inputs: %.3f' % time_inputs(get_mixed_inputs))

I also had to comment out the image_summary line 133 of tensorflow/models/image/cifar10/cifar10_inputs.py.

This yielded the following results:

Train inputs: 0.055
Eval inputs: 0.050
Mixed inputs: 0.105

It would seem in the mixed case both inputs are being read/parsed, even though only 1 is used. Is there a way of avoiding this redundant computation? Or is there a nicer way of switching between training/evaluation data that still leverages the queue-runner setup?

like image 738
DomJack Avatar asked Aug 28 '16 03:08

DomJack


2 Answers

Have you read the last section of this link about multi inputs? I think you can add a is_training argument to your input function to distinguish training data from eval data. Then you can reuse sharing variables to get the logits for eval data and build a op for eval. Then in your graph, run valudation_accuracy=sess.run(eval_op) to get eval accuracy.


Update:

Hi, from my understanding,if you want to train for n batches, evaluate, train, evaluate, you can keep there two ops in the same graph, no need to build a new one. Assume you have already build all the needed function, then the code should like this:

#the following two steps will add train and eval input queue to the graph
train_inputs,train_labels = inputs(is_train=True)
eval_inputs,eval_labels = inputs(is_train=False)

with tf.variable_scope("inference") as scope:
    train_logits = inference(train_inputs)
    scope.reuse_variables()
    eval_logits = inference(eval_inputs)

loss = loss(train_logits,train_labels)
eval_accuracy = accuracy(eval_logits,eval_labels)

#...add train op here,start queue runner and train it ...
like image 54
Jie.Zhou Avatar answered Sep 30 '22 13:09

Jie.Zhou


After some experimentation, my current best solution is to have a main graph featuring training inputs and a separate graph with just evaluation data operations. I open a separate session to get evaluation data and feed this to the training graph when I want to evaluate. Highly inelegant (and evaluation runs take longer than they ideally would as they have to come ot of one session only to be fed to another), but assuming evaluation runs are rare compared to training runs, this seems preferable to the original version...

import tensorflow as tf
from tensorflow.models.image.cifar10 import cifar10
from time import time


class DataSupplier:
    def __init__(self, tensor_fn):
        graph = tf.Graph()
        with graph.as_default():
            with graph.device('/cpu:0'):
                self.tensor = tensor_fn()
        self.sess = tf.Session(graph=graph)
        self.coord = tf.train.Coordinator()
        self.threads = tf.train.start_queue_runners(sess=self.sess,
                                                    coord=self.coord)

    def get_tensor_val(self):
        return self.sess.run(self.tensor)

    def clean_up(self):
        self.coord.request_stop()
        self.coord.join(self.threads)


eval_batcher = DataSupplier(lambda: cifar10.inputs(True))

graph = tf.Graph()
with graph.as_default():
    images, labels = cifar10.inputs(False)

    out_images = tf.identity(images)
    out_labels = tf.identity(labels)

n_runs = 100

with tf.Session(graph=graph) as sess:
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess, coord)
    for i in range(n_runs):
        sess.run([out_images, out_labels])
    t = time()
    for i in range(n_runs):
        sess.run([out_images, out_labels])
    dt = (time() - t)/n_runs
    print('Train time: %.3f' % dt)
    t = time()
    for i in range(n_runs):
        eval_images, eval_labels = eval_batcher.get_tensor_val()
        sess.run([out_images, out_labels],
                 feed_dict={images: eval_images, labels: eval_labels})
    dt = (time() - t)/n_runs
    print('Eval time: %.3f' % dt)
    coord.request_stop()
    coord.join(threads)

eval_batcher.clean_up()

Results:

Train time: 0.050
Eval time: 0.064

Update: when using this approach in training problems with tf.contrib.layers and regularization, I find the regularization losses go to infinity if the DataSupplier graph is on the same device as the training graph. I cannot for the life of me explain why this is the case, but explicitly setting the device of the DataSupplier to the CPU (given the training graph is on my GPU) seems to work...

like image 41
DomJack Avatar answered Sep 30 '22 15:09

DomJack