Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using tensorflow TFRecords for a dataset with different image sizes

Tags:

tensorflow

In tensorflow tutorial example usage of TFRecords is provided with the MNIST dataset. MNIST dataset is converted to TFRecords file like this:

def convert_to(data_set, name):
  images = data_set.images
  labels = data_set.labels
  num_examples = data_set.num_examples

  if images.shape[0] != num_examples:
    raise ValueError('Images size %d does not match label size %d.' %
                     (images.shape[0], num_examples))
  rows = images.shape[1]
  cols = images.shape[2]
  depth = images.shape[3]

  filename = os.path.join(FLAGS.directory, name + '.tfrecords')
  print('Writing', filename)
  writer = tf.python_io.TFRecordWriter(filename)
  for index in range(num_examples):
    image_raw = images[index].tostring()
    example = tf.train.Example(features=tf.train.Features(feature={
        'height': _int64_feature(rows),
        'width': _int64_feature(cols),
        'depth': _int64_feature(depth),
        'label': _int64_feature(int(labels[index])),
        'image_raw': _bytes_feature(image_raw)}))
    writer.write(example.SerializeToString())
  writer.close()

And then it is readed and decoded like this:

def read_and_decode(filename_queue):
  reader = tf.TFRecordReader()
  _, serialized_example = reader.read(filename_queue)
  features = tf.parse_single_example(
      serialized_example,
      # Defaults are not specified since both keys are required.
      features={
          'image_raw': tf.FixedLenFeature([], tf.string),
          'label': tf.FixedLenFeature([], tf.int64),
      })

  # Convert from a scalar string tensor (whose single string has
  # length mnist.IMAGE_PIXELS) to a uint8 tensor with shape
  # [mnist.IMAGE_PIXELS].
  image = tf.decode_raw(features['image_raw'], tf.uint8)
  image.set_shape([mnist.IMAGE_PIXELS])

  # OPTIONAL: Could reshape into a 28x28 image and apply distortions
  # here.  Since we are not applying any distortions in this
  # example, and the next step expects the image to be flattened
  # into a vector, we don't bother.

  # Convert from [0, 255] -> [-0.5, 0.5] floats.
  image = tf.cast(image, tf.float32) * (1. / 255) - 0.5

  # Convert label from a scalar uint8 tensor to an int32 scalar.
  label = tf.cast(features['label'], tf.int32)

  return image, label

Question: it there a way to read images from TFRecords with different sizes? Because at this point

image.set_shape([mnist.IMAGE_PIXELS])

all tensors sizes need to be known. Which means I can't do something like

width = tf.cast(features['width'], tf.int32)
height = tf.cast(features['height'], tf.int32) 
tf.reshape(image, [width, height, 3])

So how do I use TFRecords in this case? Also I can't understand why in the tutorial authors are saving height and width in TFRecords file if they don't use it after, and use a predefined constant instead when they read and decode the image.

like image 527
sergekondrat Avatar asked Nov 09 '22 11:11

sergekondrat


1 Answers

For the training in this particular case there is no reason to keep the width and height, however since the images are serialized into a single byte stream a future you might wonder what shape that data originally had instead of 784 bytes - essentially, they're just creating self-contained examples.

As for differently sized images, you have to keep in mind that at some point you need to map your feature tensors to weights and that since the number of weights is fixed for a given network, so have to be the dimensions of the feature tensors. Another point to think about is data normalization: If you're using differently shaped images, do they have the same mean and variance? You might chose to ignore that point, but if you don't, you have to come up with a solution for it as well.

If you are just asking to use images of different sizes, i.e. 100x100x3 instead of 28x28x1, you can of course use

image.set_shape([100, 100, 3])

in order to reshape a single tensor of 30000 "elements" total to a single rank-3 tensor. Or, if you are working with batches (of to-be-determined size), you might use

image_batch.set_shape([None, 100, 100, 3])

Note that this is not a list of tensors but a single rank 4 tensor and because of that all images in that batch have to have the same dimensions; i.e. having a 100x100x3 image followed by a 28x28x1 image in the same batch is not possible.

Before batching though you are free to have whatever size and shape you want and you can as well load the shapes from the records - which they did not do in the MNIST example. You might, for example, apply any of the image processing operations in order to obtain augmented image of fixed size for further processing.

Note also that the serialized representations of the images may indeed have different lengths and shapes. You may for example decide to store JPEG or PNG bytes instead of raw pixel values; they would obviously have different sizes.

Finally, there's tf.FixedLenFeature() as well, but they are creating SparseTensor representations. That's typically nothing related to nonbinary images though.

like image 197
sunside Avatar answered Nov 15 '22 08:11

sunside