Load image files in a directory as dataset for training in Tensorflow

Question

I am a newbie for tensorflow, and I'm starting with the offical MNIST example code to learn the logic of tensorflow. However, one thing I felt not good is that, the MNIST example provides the original dataset as some compressed files, whose format is not clear to beginners. This case also goes with Cifar10 which provides the dataset as a binary file. I think in practical deep learning task, our dataset may be lots of image files, such as *.jpg or *.png in a directory, and we also have a text file recording the label of each file (like ImageNet dataset). Let me use MNIST as an example.

MNIST contains 50k training images of size 28 x 28. Now let's assume these images are in jpg format, and stored in a directory ./dataset/. In ./dataset/, we have a text file label.txt storing the label of each image:

/path/to/dataset/
                 image00001.jpg
                 image00002.jpg
                 ... ... ... ...
                 image50000.jpg
                 label.txt

where label.txt is like this:

#label.txt:
image00001.jpg 1
image00002.jpg 0
image00003.jpg 4
image00004.jpg 9
... ... ... ... 
image50000.jpg 3

Now I would like to use Tensorflow to train a single layer model with these dataset. Could anyone help to give a simple code snippet to do that?

Steven · Accepted Answer

There's basically two things you'd need. The first is normal python code like so:

import numpy as np
from scipy import misc # feel free to use another image loader

def create_batches(batch_size):
  images = []
  for img in list_of_images:
    images.append(misc.imread(img))
  images = np.asarray(images)

  #do something similar for the labels

  while (True):
    for i in range(0,total,batch_size):
      yield(images[i:i+batch_size],labels[i:i+batch_size])

now comes the tensorflow part

imgs = tf.placeholder(tf.float32,shape=[None,height,width,colors])
lbls = tf.placeholder(tf.int32, shape=[None,label_dimension])

with tf.Session() as sess:
#define rest of graph here
# convolutions or linear layers and cost function etc.


  batch_generator = create_batches(batch_size)
  for i in range(number_of_epochs):
    images, labels = batch_generator.next()
    loss_value = sess.run([loss], feed_dict={imgs:images, lbls:labels})

Load image files in a directory as dataset for training in Tensorflow

Tags:

python

tensorflow

C. Wang

1 Answers

Steven

Recent Activity

Donate For Us

Load image files in a directory as dataset for training in Tensorflow

Tags:

python

tensorflow

C. Wang

1 Answers

Steven

Related questions

Recent Activity

Donate For Us