what's the right way to load more than one big datasets into tensor flow?
I have three big datasets(files), for train, validate and test respectively. I can successfully load training set thru tf.train.string_input_producer, and feeds it into a tf.train.shuffle_batch object. Then I can iteratively get batch of data to optimize my model.
But, I got stuck when trying to load my validation set by the same way, the program keeps saying "OutOfRange Error" even I didn't set num_epochs in string_input_producer.
Can anyone shed some lights on it? And besides this, I am also thinking what's the right approach to do training/validation in tensorflow? Actually, I didn't see any examples (I searched a lot) which have do both train and test on a big data set. It's so strange to me ...
Code snippet below.
def extract_validationset(filename, batch_size):
with tf.device("/cpu:0"):
queue = tf.train.string_input_producer([filename])
reader = tf.TextLineReader()
_, line = reader.read(queue)
line = tf.decode_csv(...)
label = line[0]
feature = tf.pack(list(line[1:]))
l, f = tf.train.batch([label, feature], batch_size=batch_size, num_threads=8)
return l, f
def extract_trainset(train, batch_size):
with tf.device("/cpu:0"):
train_files = tf.train.string_input_producer([train])
reader = tf.TextLineReader()
_, train_line = reader.read(train_files)
train_line = tf.decode_csv(...)
l, f = tf.train.shuffle_batch(...,
batch_size=batch_size, capacity=50000, min_after_dequeue=10000, num_threads=8)
return l, f
....
label_batch, feature_batch = extract_trainset("train", batch_size)
label_eval, feature_eval = extract_validationset("test", batch_size)
with tf.Session() as sess:
tf.initialize_all_variables().run()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
# Loop through training steps.
for step in xrange(int(num_epochs * train_size) // batch_size):
feature, label = sess.run([feature_batch, label_batch])
feed_dict = {train_data_node: feature, train_labels_node: label}
_, l, predictions = sess.run([optimizer, loss, evaluation], feed_dict=feed_dict)
# after EVAL_FREQUENCY steps, do evaluation on whole test set
if step % EVAL_FREQUENCY == 0:
for step in xrange(steps_per_epoch):
f, l = sess.run([feature_eval, label_eval])
true_count += sess.run(evaluation, feed_dict={train_data_node: f, train_labels_node: l})
print('Precision @ 1: %0.04f' % true_count / num_examples)
<!---- ERROR ---->
tensorflow.python.framework.errors.OutOfRangeError: FIFOQueue '_5_batch/fifo_queue' is closed and has insufficient elements (requested 334, current size 0)
[[Node: batch = QueueDequeueMany[component_types=[DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, batch/n)]]
Caused by op u'batch', defined at:
This is maybe late but I had the same problem. In my case I was foolishly calling sess.run after I had closed shop with coord.request_stop(), coord.join_threads().
Maybe you have something like coord.request_stop() that is run in your "train" code, closing the queues for when you try to load your validation data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With