Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read images with different size in a TFRecord file

I have created a dataset and saved it into a TFRecord file. The thing is the pictures have different size, so I want to save the size as well with the images. So I used the TFRecordWriter and defined the features like:

example = tf.train.Example(features=tf.train.Features(feature={
  'rows': _int64_feature(image.shape[0]),
  'cols': _int64_feature(image.shape[1]),
  'image_raw': _bytes_feature(image_raw)}))

I expected that I can read and decode the image using TFRecordReader, but the thing is I cannot get the value of rows and cols from the file because they are tensors. So how am I supposed to do to read the size dynamically and reshape the image accordingly. Thanks guys

like image 457
Tong Shen Avatar asked Jan 27 '16 03:01

Tong Shen


People also ask

What is the ideal size of a TFRecord file size?

At the same time, each file should be large enough (at least 10 MB+ and ideally 100 MB+) so that you can benefit from I/O prefetching. For example, say you have X GB of data and you plan to train on up to N hosts. Ideally, you should shard the data to ~ 10*N files, as long as ~ X/(10*N) is 10 MB+ (and ideally 100 MB+).

What is TFRecord format?

The TFRecord format is Tensorflow's own binary storage format. It uses Protocol buffers, a cross-platform, cross-language library for efficient serialization of structured data. Using the TFRecord format has many advantages: Efficiency: Data in the TFRecord format can take up less space than the original data.


2 Answers

You can call tf.reshape with a dynamic shape parameter.

image_rows = tf.cast(features['rows'], tf.int32)
image_cols = tf.cast(features['cols'], tf.int32)
image_data = tf.decode_raw(features['image_raw'], tf.uint8)
image = tf.reshape(image_data, tf.pack([image_rows, image_cols, 3]))
like image 141
bgshi Avatar answered Nov 02 '22 07:11

bgshi


I suggest a workflow like:

TARGET_HEIGHT = 500
TARGET_WIDTH = 500

image = tf.image.decode_jpeg(image_buffer, channels=3)
image = tf.image.convert_image_dtype(image, dtype=tf.float32)

# Choose your bbox here.
bbox_begin = ...  (should be (h_start, w_start, 0))
bbox_size = tf.constant((TARGET_HEIGHT, TARGET_WIDTH, 3), dtype=tf.int32)

cropped_image = tf.slice(image, bbox_begin, bbox_size)

cropped_image has a constant tensor size, and can then be thrown into a shuffle batch.

You can dynamically access the size of the decoded image using tf.shape(image). You can do computations on the resulting sub-elements and then stitch them back together using something like bbox_begin = tf.pack([bbox_h_start, bbox_y_start, 0]). Just need to insert your own logic in there for determining the start points of the crop, and what you want to do if the image starts out smaller than you want for your pipeline.

If you want to upsize only if the image is smaller than your target dimensions, you'll need to use tf.control_flow_ops.cond or equivalent. But you could use min and max operations to set the size of your crop window so that you're returning the full image iff it's smaller than the requested dimensions, and then unconditionally resize up to 500x500. The cropped image will already be at 500x500, so the resize should become an effective no-op.

like image 33
dga Avatar answered Nov 02 '22 08:11

dga