How to read data into TensorFlow batches from example queue?

Tags:

How do I get TensorFlow example queues into proper batches for training?

I've got some images and labels:

IMG_6642.JPG 1
IMG_6643.JPG 2

(feel free to suggest another label format; I think I may need another dense to sparse step...)

I've read through quite a few tutorials but don't quite have it all together yet. Here's what I have, with comments indicating the steps required from TensorFlow's Reading Data page.

The list of filenames (optional steps removed for the sake of simplicity)
Filename queue
A Reader for the file format
A decoder for a record read by the reader
Example queue

And after the example queue I need to get this queue into batches for training; that's where I'm stuck...

1. List of filenames

files = tf.train.match_filenames_once('*.JPG')

4. Filename queue

filename_queue = tf.train.string_input_producer(files, num_epochs=None, shuffle=True, seed=None, shared_name=None, name=None)

5. A reader

reader = tf.TextLineReader() key, value = reader.read(filename_queue)

6. A decoder

record_defaults = [[""], [1]] col1, col2 = tf.decode_csv(value, record_defaults=record_defaults) (I don't think I need this step below because I already have my label in a tensor but I include it anyways)

features = tf.pack([col2])

The documentation page has an example to run one image, not get the images and labels into batches:

for i in range(1200): # Retrieve a single instance: example, label = sess.run([features, col5])

And then below it has a batching section:

def read_my_file_format(filename_queue):
  reader = tf.SomeReader()
  key, record_string = reader.read(filename_queue)
  example, label = tf.some_decoder(record_string)
  processed_example = some_processing(example)
  return processed_example, label

def input_pipeline(filenames, batch_size, num_epochs=None):
  filename_queue = tf.train.string_input_producer(
  filenames, num_epochs=num_epochs, shuffle=True)
  example, label = read_my_file_format(filename_queue)
  # min_after_dequeue defines how big a buffer we will randomly sample
  #   from -- bigger means better shuffling but slower start up and more
  #   memory used.
  # capacity must be larger than min_after_dequeue and the amount larger
  #   determines the maximum we will prefetch.  Recommendation:
  #   min_after_dequeue + (num_threads + a small safety margin) *              batch_size
  min_after_dequeue = 10000
  capacity = min_after_dequeue + 3 * batch_size
  example_batch, label_batch = tf.train.shuffle_batch(
  [example, label], batch_size=batch_size, capacity=capacity,
  min_after_dequeue=min_after_dequeue)
  return example_batch, label_batch

My question is: how do I use the above example code with the code I have above? I need batches to work with, and most of the tutorials come with mnist batches already.

with tf.Session() as sess:
  sess.run(init)

  # Training cycle
for epoch in range(training_epochs):
    total_batch = int(mnist.train.num_examples/batch_size)
    # Loop over all batches
    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)

641

asked May 09 '16 21:05

JohnAllen

1 Answers

If you wish to make this input pipeline work, you will need add an asynchronous queue'ing mechanism that generate batches of examples. This is performed by creating a tf.RandomShuffleQueue or a tf.FIFOQueue and inserting JPEG images that have been read, decoded and preprocessed.

You can use handy constructs that will generate the Queues and the corresponding threads for running the queues via tf.train.shuffle_batch_join or tf.train.batch_join. Here is a simplified example of what this would like. Note that this code is untested:

# Let's assume there is a Queue that maintains a list of all filenames
# called 'filename_queue'
_, file_buffer = reader.read(filename_queue)

# Decode the JPEG images
images = []
image = decode_jpeg(file_buffer)

# Generate batches of images of this size.
batch_size = 32

# Depends on the number of files and the training speed.
min_queue_examples = batch_size * 100
images_batch = tf.train.shuffle_batch_join(
  image,
  batch_size=batch_size,
  capacity=min_queue_examples + 3 * batch_size,
  min_after_dequeue=min_queue_examples)

# Run your network on this batch of images.
predictions = my_inference(images_batch)

Depending on how you need to scale up your job, you might need to run multiple independent threads that read/decode/preprocess images and dump them in your example queue. A complete example of such a pipeline is provided in the Inception/ImageNet model. Take a look at batch_inputs:

https://github.com/tensorflow/models/blob/master/inception/inception/image_processing.py#L407

Finally, if you are working with >O(1000) JPEG images, keep in mind that it is extremely inefficient to individually ready 1000's of small files. This will slow down your training quite a bit.

A more robust and faster solution to convert a dataset of images to a sharded TFRecord of Example protos. Here is a fully worked script for converting the ImageNet data set to such a format. And here is a set of instructions for running a generic version of this preprocessing script on an arbitrary directory containing JPEG images.

142

answered Sep 20 '22 06:09

user5869947

Related questions
                            
                                OverflowError occurs when using cython with a large int
                            
                                NumPy performance: uint8 vs. float and multiplication vs. division?
                            
                                How can I make my class pretty printable in Python?
                            
                                No FileSystem for scheme: s3 with pyspark
                            
                                What is the pandas.Panel deprecation warning actually recommending?
                            
                                Reproducibility and performance in PyTorch
                            
                                What is the Simplest Possible Payment Gateway to Implement? (using Django) [closed]
                            
                                In python, how to get subparsers to read in parent parser's argument?
                            
                                Which Python user interface library can I use for 2D games? [closed]
                            
                                Python + Django + VirtualEnv + Windows
                            
                                Summarizing a Wikipedia Article
                            
                                How to pass a numpy array of string types to a function in Cython
                            
                                python example for reading multiple protobuf messages from a stream
                            
                                Writing a tokenizer in Python
                            
                                Affine transformation between contours in OpenCV
                            
                                _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) [duplicate]
                            
                                Are sessions needed for python-social-auth
                            
                                Python: how to check if an item was added to a set, without 2x (hash, lookup)
                            
                                Does jedi-vim conflict with YouCompleteMe?
                            
                                How to make an Python subclass uncallable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to read data into TensorFlow batches from example queue?

Tags:

python

numpy

tensorflow

classification

JohnAllen

People also ask

1 Answers

user5869947

Recent Activity

Donate For Us