Loading folders of images in tensorflow

I'm new to tensorflow, but i already followed and executed the tutorials they promote and many others all over the web. I made a little convolutional neural network over the MNIST images. Nothing special, but i would like to test on my own images. Now my problem comes: I created several folders; the name of each folder is the class (label) the images inside belong.

The images have different shapes; i mean they have no fixed size.

How can i load them for using with Tensorflow?

I followed many tutorials and answers both here on StackOverflow and on others Q/A sites. But still, i did not figure out how to do this.

The tf.data API (tensorflow 1.4 onwards) is great for things like this. The pipeline will looks something like the following:

  • Create an initial tf.data.Dataset object that iterates over all examples
  • (if training) shuffle/repeat the dataset;
  • map it through some function that makes all images the same size;
  • batch;
  • (optionall) prefetch to tell your program to collect the preprocess subsequent batches of data while the network is processing the current batch; and
  • and get inputs.

There are a number of ways of creating your initial dataset (see here for a more in depth answer)

TFRecords with Tensorflow Datasets

Supporting tensorflow version 1.12 onwards, Tensorflow datasets provides a relatively straight-forward API for creating tfrecord datasets, and also handles data downloading, sharding, statistics generation and other functionality automatically.

See e.g. this image classification dataset implementation. There's a lot of bookeeping stuff in there (download urls, citations etc), but the technical part boils down to specifying features and writing a _generate_examples function

features = tfds.features.FeaturesDict({
            "image": tfds.features.Image(shape=(_TILES_SIZE,) * 2 + (3,)),
            "label": tfds.features.ClassLabel(
            "filename": tfds.features.Text(),


def _generate_examples(self, root_dir):
  root_dir = os.path.join(root_dir, _TILES_SUBDIR)
  for i, class_name in enumerate(_CLASS_NAMES):
    class_dir = os.path.join(root_dir, _class_subdir(i, class_name))
    fns = tf.io.gfile.listdir(class_dir)

    for fn in sorted(fns):
      image = _load_tif(os.path.join(class_dir, fn))
      yield {
          "image": image,
          "label": class_name,
          "filename": fn,

You can also generate the tfrecords using lower level operations.

Load images via tf.data.Dataset.map and tf.py_func(tion)

Alternatively you can load the image files from filenames inside tf.data.Dataset.map as below.

image_paths, labels = load_base_data(...)
epoch_size = len(image_paths)
image_paths = tf.convert_to_tensor(image_paths, dtype=tf.string)
labels = tf.convert_to_tensor(labels)

dataset = tf.data.Dataset.from_tensor_slices((image_paths, labels))

if mode == 'train':
    dataset = dataset.repeat().shuffle(epoch_size)

def map_fn(path, label):
    # path/label represent values for a single example
    image = tf.image.decode_jpeg(tf.read_file(path))

    # some mapping to constant size - be careful with distorting aspec ratios
    image = tf.image.resize_images(out_shape)
    # color normalization - just an example
    image = tf.to_float(image) * (2. / 255) - 1
    return image, label

# num_parallel_calls > 1 induces intra-batch shuffling
dataset = dataset.map(map_fn, num_parallel_calls=8)
dataset = dataset.batch(batch_size)
# try one of the following
dataset = dataset.prefetch(1)
# dataset = dataset.apply(
#            tf.contrib.data.prefetch_to_device('/gpu:0'))

images, labels = dataset.make_one_shot_iterator().get_next()

I've never worked in a distributed environment, but I've never noticed a performance hit from using this approach over tfrecords. If you need more custom loading functions, also check out tf.py_func.

More general information here, and notes on performance here

Sample input pipeline script to load images and labels from directory. You could do preprocessing(resizing images etc.,) after this.

import tensorflow as tf
filename_queue = tf.train.string_input_producer(

image_reader = tf.WholeFileReader()
key, image_file = image_reader.read(filename_queue)
S = tf.string_split([key],'/')
length = tf.cast(S.dense_shape[1],tf.int32)
# adjust constant value corresponding to your paths if you face issues. It should work for above format.
label = S.values[length-tf.constant(2,dtype=tf.int32)]
label = tf.string_to_number(label,out_type=tf.int32)
image = tf.image.decode_png(image_file)

# Start a new session to show example output.
with tf.Session() as sess:
    # Required to get the filename matching to run.

    # Coordinate the loading of image files.
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    for i in xrange(6):
        # Get an image tensor and print its value.
        key_val,label_val,image_tensor = sess.run([key,label,image])

    # Finish off the filename queue coordinator.

File Directory



 (881, 2079, 3)
 (155, 2552, 3)
 (562, 1978, 3)
 (291, 2558, 3)
 (157, 2554, 3)
 (866, 936, 3)
For loading images of equal size just use this:


docs: https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image_dataset_from_directory

