Can anyone tell me where metadata is stored in common video file formats? And if it would be located towards the start of the file, or scattered throughout.
I'm working with a remote object store containing a lot of video files and I want to extract metadata, in particular video duration and video dimensions from those files, without streaming the entire file contents to the local machine.
I'm hoping that this metadata will be stored in the first X bytes of files, and so I can just fetch a byte range starting at the beginning instead of the whole file, passing this partial file data to ffprobe
.
For testing purposes I created a 22MB MP4 file, and used the following command to supply only the first 1MB of data to ffprobe:
head -c1024K '2013-07-04 12.20.07.mp4' | ffprobe -
It prints:
avprobe version 0.8.6-4:0.8.6-0ubuntu0.12.04.1, Copyright (c) 2007-2013 the Libav developers
built on Apr 2 2013 17:02:36 with gcc 4.6.3
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x1a6b7a0] stream 0, offset 0x10beab: partial file
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'pipe:':
Metadata:
major_brand : isom
minor_version : 0
compatible_brands: isom3gp4
creation_time : 1947-07-04 11:20:07
Duration: 00:00:09.84, start: 0.000000, bitrate: N/A
Stream #0.0(eng): Video: h264 (High), yuv420p, 1920x1080, 20028 kb/s, PAR 65536:65536 DAR 16:9, 29.99 fps, 30 tbr, 90k tbn, 180k tbc
Metadata:
creation_time : 1947-07-04 11:20:07
Stream #0.1(eng): Audio: aac, 48000 Hz, stereo, s16, 189 kb/s
Metadata:
creation_time : 1947-07-04 11:20:07
So I see the first 1MB was enough to extract video duration 9.84 seconds and video dimensions 1920x1080, even though ffprobe printed the warning about detecting a partial file. If I supply less than 1MB, it fails completely.
Would this approach work for other common video file formats to reliably extract metadata, or do any common formats scatter metadata throughout the file?
I'm aware of the concept of container formats and that various codecs may be used represent the audio/video data inside those containers. I'm not familiar with the details though. So I guess the question may apply to common combinations of containers + codecs? Thanks in advance.
Metadata of a video file allows users to identify the characteristics of the file, making it easier to search, use and manage the video. The video metadata can, for instance, include the date the video was created, the creator's name, location, date of upload, and camera ID.
ExifTool is a free and open-source software program for reading, writing, and manipulating image, audio, video, and PDF metadata.
Metadata is data about data. Every single digital artifact has it. It describes the who, what, when, where, how, and sometimes even, why, for any document, video, photo, or sound clip. This information comes in handy sometimes, like when you're flipping through old pictures by date, or by location.
Okay to answer my own question after a lot of digging through the specs for MP4, 3GP and AVI...
Metadata is at the start of AVI files, according to the AVI file format specification.
Video duration is not stored verbatim in AVI files, but is calculated (in microseconds) as dwMicroSecPerFrame x dwTotalFrames.
Reading between the lines of the spec, it seems that many items of metadata can be read directly from offsets within AVI files without parsing at all. But the spec does not mention these offsets explicitly so using this rule of thumb could be risky.
Offset 32: dwMicroSecPerFrame, offset 48: dwTotalFrames, offset 64: dwWidth, offset 68: dwHeight.
So for AVI, it is possible to extract this metadata with only the first X bytes of the file.
All of these file formats are based on the ISO base media file format known as ISO/IEC 14496-12 (MPEG-4 Part 12).
This format allows metadata to be stored anywhere in the file, but in practice it will be either at the start or the end because the raw captured audio/video data is saved contiguously in the middle. (An exception however, would be "fragmented" MP4 files, which are rare.)
Only files with the metadata stored at the start can be played via progressive download, but it is up to the capture device or decoder to support this.
AFAICT this means that to extract metadata from these files, only the first X bytes of the file would be required, and from that information it could be determined that potentially also the last X bytes would be required. But bytes in the middle would not be required.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With