TensorFlow - Read video frames from TFRecords file

Tags:

TLDR; my question is on how to load compressed video frames from TFRecords.

I am setting up a data pipeline for training deep learning models on a large video dataset (Kinetics). For this I am using TensorFlow, more specifically the tf.data.Dataset and TFRecordDataset structures. As the dataset contains ~300k videos of 10 seconds, there is a large amount of data to deal with. During training, I want to randomly sample 64 consecutive frames from a video, therefore fast random sampling is important. For achieving this there are a number of data loading scenarios possible during training:

Sample from Video. Load the videos using ffmpeg or OpenCV and sample frames. Not ideal as seeking in videos is tricky, and decoding video streams is much slower than decoding JPG.
JPG Images. Preprocess the dataset by extracting all video frames as JPG. This generates a huge amount of files, which is probably not going to be fast due to random access.
Data Containers. Preprocess the dataset to TFRecords or HDF5 files. Requires more work getting the pipeline ready, but most likely to be the fastest of those options.

I have decided to go for option (3) and use TFRecord files to store a preprocessed version of the dataset. However, this is also not as straightforward as it seems, for example:

Compression. Storing the video frames as uncompressed byte data in TFRecords will require a huge amount of disk space. Therefore, I extract all the video frames, apply JPG compression and store the compressed bytes as TFRecords.
Video Data. We are dealing with video, so each example in the TFRecords file will be quite large and contains several video frames (typically 250-300 for 10 seconds of video, depending on the frame rate).

I have wrote the following code to preprocess the video dataset and write the video frames as TFRecord files (each of ~5GB in size):

def _int64_feature(value):
    """Wrapper for inserting int64 features into Example proto."""
    if not isinstance(value, list):
        value = [value]
    return tf.train.Feature(int64_list=tf.train.Int64List(value=value))

def _bytes_feature(value):
    """Wrapper for inserting bytes features into Example proto."""
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))


with tf.python_io.TFRecordWriter(output_file) as writer:

  # Read and resize all video frames, np.uint8 of size [N,H,W,3]
  frames = ... 

  features = {}
  features['num_frames']  = _int64_feature(frames.shape[0])
  features['height']      = _int64_feature(frames.shape[1])
  features['width']       = _int64_feature(frames.shape[2])
  features['channels']    = _int64_feature(frames.shape[3])
  features['class_label'] = _int64_feature(example['class_id'])
  features['class_text']  = _bytes_feature(tf.compat.as_bytes(example['class_label']))
  features['filename']    = _bytes_feature(tf.compat.as_bytes(example['video_id']))

  # Compress the frames using JPG and store in as bytes in:
  # 'frames/000001', 'frames/000002', ...
  for i in range(len(frames)):
      ret, buffer = cv2.imencode(".jpg", frames[i])
      features["frames/{:04d}".format(i)] = _bytes_feature(tf.compat.as_bytes(buffer.tobytes()))

  tfrecord_example = tf.train.Example(features=tf.train.Features(feature=features))
  writer.write(tfrecord_example.SerializeToString())

This works fine; the dataset is nicely written as TFRecord files with the frames as compressed JPG bytes. My question regards, how to read the TFRecord files during training, randomly sample 64 frames from a video and decode the JPG images.

According to TensorFlow's documentation on tf.Data we need to do something like:

filenames = tf.placeholder(tf.string, shape=[None])
dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(...)  # Parse the record into tensors.
dataset = dataset.repeat()  # Repeat the input indefinitely.
dataset = dataset.batch(32)
iterator = dataset.make_initializable_iterator()
training_filenames = ["/var/data/file1.tfrecord", "/var/data/file2.tfrecord"]
sess.run(iterator.initializer, feed_dict={filenames: training_filenames})

There are many example on how to do this with images, and that is quite straightforward. However, for video and random sampling of frames I am stuck. The tf.train.Features object stores the frames as frame/00001, frame/000002 etc. My first question is how to randomly sample a set of consecutive frames from this inside the dataset.map() function? Considerations are that each frame has a variable number of bytes due to JPG compression and need to be decoded using tf.image.decode_jpeg.

Any help how to best setup reading video sampels from TFRecord files would be appreciated!

204

asked Jan 04 '18 18:01

verified.human

1 Answers

Encoding each frame as a separate feature makes it difficult to select frames dynamically, because the signature of tf.parse_example() (and tf.parse_single_example()) requires that the set of parsed feature names be fixed at graph construction time. However, you could try encoding the frames as a single feature that contains a list of JPEG-encoded strings:

def _bytes_list_feature(values):
    """Wrapper for inserting bytes features into Example proto."""
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=values))

with tf.python_io.TFRecordWriter(output_file) as writer:

  # Read and resize all video frames, np.uint8 of size [N,H,W,3]
  frames = ... 

  features = {}
  features['num_frames']  = _int64_feature(frames.shape[0])
  features['height']      = _int64_feature(frames.shape[1])
  features['width']       = _int64_feature(frames.shape[2])
  features['channels']    = _int64_feature(frames.shape[3])
  features['class_label'] = _int64_feature(example['class_id'])
  features['class_text']  = _bytes_feature(tf.compat.as_bytes(example['class_label']))
  features['filename']    = _bytes_feature(tf.compat.as_bytes(example['video_id']))

  # Compress the frames using JPG and store in as a list of strings in 'frames'
  encoded_frames = [tf.compat.as_bytes(cv2.imencode(".jpg", frame)[1].tobytes())
                    for frame in frames]
  features['frames'] = _bytes_list_feature(encoded_frames)

  tfrecord_example = tf.train.Example(features=tf.train.Features(feature=features))
  writer.write(tfrecord_example.SerializeToString())

Once you have done this, it will be possible to slice the frames feature dynamically, using a modified version of your parsing code:

def decode(serialized_example, sess):
  # Prepare feature list; read encoded JPG images as bytes
  features = dict()
  features["class_label"] = tf.FixedLenFeature((), tf.int64)
  features["frames"] = tf.VarLenFeature(tf.string)
  features["num_frames"] = tf.FixedLenFeature((), tf.int64)

  # Parse into tensors
  parsed_features = tf.parse_single_example(serialized_example, features)

  # Randomly sample offset from the valid range.
  random_offset = tf.random_uniform(
      shape=(), minval=0,
      maxval=parsed_features["num_frames"] - SEQ_NUM_FRAMES, dtype=tf.int64)

  offsets = tf.range(random_offset, random_offset + SEQ_NUM_FRAMES)

  # Decode the encoded JPG images
  images = tf.map_fn(lambda i: tf.image.decode_jpeg(parsed_features["frames"].values[i]),
                     offsets)

  label  = tf.cast(parsed_features["class_label"], tf.int64)

  return images, label

(Note that I haven't been able to run your code, so there may be some small errors, but hopefully it is enough to get you started.)

answered Oct 27 '22 10:10

mrry

Related questions
                            
                                How to insert a new column with repeated values into a pandas table? [duplicate]
                            
                                Pycharm plugin for attrs?
                            
                                Bokeh Circle does not fit into square?
                            
                                Graph k-NN decision boundaries in Matplotlib
                            
                                Freeze a program created with Python's `click` pacage
                            
                                Python and Selenium - Avoid submit form when send_keys() with newline
                            
                                Importing module not working
                            
                                retrieve async ads insights results from FB ads API with pagination
                            
                                Regex doesn't stop evaluating after matching with first rule with OR operator
                            
                                Gaussian Mixture Models of an Image's Histogram
                            
                                How to get latest file-name or file from S3 bucket using event triggered lambda
                            
                                Rendering a unicode/ascii character to a numpy array
                            
                                python @memoize vs functools.lru_cache
                            
                                How to test a class' inherited methods in pytest
                            
                                When is the locals dictionary set?
                            
                                Inheritance of class variables in python
                            
                                Django: Do I really need apps.py inside an app?
                            
                                How to detect if a function has been defined locally?
                            
                                Should I use pip.main() or subprocess.call() to invoke pip commands?
                            
                                Convert Multipolygon to Polygon in Python [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

TensorFlow - Read video frames from TFRecords file

Tags:

python

tensorflow

deep-learning

tensorflow-datasets

tfrecord

verified.human

People also ask

1 Answers

mrry

Recent Activity

Donate For Us