Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which mp4 timescale should I use for creating/modifying chapter trak atoms?

Tags:

mp4

audio

aac

Background: I'm working on a Swift library for modifying audiobook chapters and metadata in a much simpler and user-friendly way than AVFoundation. I'm not editing with the actual media at all, this is simply for tagging and chaptering audiobook files. However, because I am creating chapter tracks, I need to have a thorough understanding of how the duration/timescale relationship works.

I thought I had it nailed down, but then I tested my library on a file encoded from a different source, and it turns out the source handles something differently enough that I can't even pass-through the file without breaking it.

I've been over MP4v2 as completely as I can, considering I don't know C++. I understand enough that I was able to model my atom parsing upon a combination of MP4v2 and the Apple Quicktime documentation, but how it's determining timescale is eluding me.

I think my problem largely exists in the mdhd atom, or possibly the tkhd atom. I understand that the timescale used in the mdhd atom can differ from the timescale used in the mvhd atom, and also that the timescale used in one track's mdhd atom may not be the same as with another track.

Where I'm getting lost is the fact that the Quicktime documentation isn't all that explicit about whether we're using mvhd timescale or mdhd timescale.

For example:

This is the documentation for the elst (Edit List Table) atom, which is supposed to be the source of the duration used in the tkhd atom, (if elst exists, and if not, the sum of the stts atom durations.)

Track duration A 32-bit integer that specifies the duration of this edit segment in units of the movie’s time scale.

Media time A 32-bit integer containing the starting time within the media of this edit segment (in media timescale units). If this field is set to –1, it is an empty edit. The last edit in a track should never be an empty edit. Any difference between the movie’s duration and the track’s duration is expressed as an implicit empty edit

Does that mean the duration is calculated using the mvhd timescale, while the media time is calculated using mdhd timescale? Or is the documentation using the words "movie" and "media" interchangeably, and if so, which one should I be using when trying to calculate duration when I create a tkhd atom for use in a chapter track?

The documentation for the stts atom says:

the length of the media in the track [is] (not mapped to the overall time scale, and not considering any edit list)

Which doesn't seem right, because without a timescale, the duration is just a meaningless integer. I can't just assume milliseconds, because I've discovered at least one test file where the chapter had to be calculated using the mdhd timescale of 44100.

Should the timescale in the chapter track mdhd atom always reflect the timescale of the sound track mdhd? Should stts be calculated using the mdhd timescale? Is there some detail I'm missing entirely here?

like image 940
NCrusher Avatar asked Nov 21 '25 06:11

NCrusher


1 Answers

All of the atoms and quicktime documentation is specific to the container specifications/protocol of packaging the codecs.

Best thing to do is to read up on the audio codec you are using.

Another approach is to reverse engineer the working media files, understand how it works, make your own formula using heuristic.

I have the same issue, there is like zero info on the subject, all the documentations are worthless, they do not tell you how the container and codec are related since it is irrelevant in terms of showing documentation specific to container.

In essence if you want to know how the duration of the codec is related to the container, you might as well ask the person who created the codec/produces media files. 10 out of 10 chances, they aren't going to tell you since it's all proprietary information how they use certain codec and package it into the container.

I have a corrupted video and was able to recover it 100% by completely creating a new moov atom from scratch using a hex editor and heuristics. It's actually easy. You need two media files from the same source that actually plays the media properly and have different media size. Such as one video file is 1 min and other is 5 mins. You then use a parser tool to view video's details of the atoms and create some formulas how they were derived.

You then use the formula to create new atoms that requires the duration for the specific codec.

like image 200
Max Dax Avatar answered Nov 24 '25 07:11

Max Dax



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!