I'm working on a mp4 container parser but I'm going crazy trying to recognize the audio codecs of the streams. I used both QtAtomViewer and AtomicParsley but when I find the atom:
trak->mdia->minf->stbl->stsd
I get always "mp4a" even if the mp4 file has an mp3 stream.
Should I look for an ".mp3" fourcc?
I attach two different mp4 structures: mp4 container with AAC audio stream
Atom trak @ 716882 of size: 2960, ends @ 719842
Atom tkhd @ 716890 of size: 92, ends @ 716982
Atom mdia @ 716982 of size: 2860, ends @ 719842
Atom mdhd @ 716990 of size: 32, ends @ 717022
Atom hdlr @ 717022 of size: 33, ends @ 717055
Atom minf @ 717055 of size: 2787, ends @ 719842
Atom dinf @ 717063 of size: 36, ends @ 717099
Atom dref @ 717071 of size: 28, ends @ 717099
Atom stbl @ 717099 of size: 2727, ends @ 719826
Atom stts @ 717107 of size: 24, ends @ 717131
Atom stsz @ 717131 of size: 1268, ends @ 718399
Atom stsc @ 718399 of size: 40, ends @ 718439
Atom stco @ 718439 of size: 32, ends @ 718471
Atom stss @ 718471 of size: 1264, ends @ 719735
Atom stsd @ 719735 of size: 91, ends @ 719826
Atom mp4a @ 719751 of size: 75, ends @ 719826
Atom esds @ 719787 of size: 39, ends @ 719826
Atom smhd @ 719826 of size: 16, ends @ 719842
mp4 container with mp3 audio stream
Atom trak @ 1663835 of size: 4844, ends @ 1668679
Atom tkhd @ 1663843 of size: 92, ends @ 1663935
Atom mdia @ 1663935 of size: 4744, ends @ 1668679
Atom mdhd @ 1663943 of size: 32, ends @ 1663975
Atom hdlr @ 1663975 of size: 45, ends @ 1664020
Atom minf @ 1664020 of size: 4659, ends @ 1668679
Atom smhd @ 1664028 of size: 16, ends @ 1664044
Atom dinf @ 1664044 of size: 36, ends @ 1664080
Atom dref @ 1664052 of size: 28, ends @ 1664080
Atom stbl @ 1664080 of size: 4599, ends @ 1668679
Atom stsd @ 1664088 of size: 87, ends @ 1664175
Atom mp4a @ 1664104 of size: 71, ends @ 1664175
Atom esds @ 1664140 of size: 35, ends @ 1664175
Atom stts @ 1664175 of size: 24, ends @ 1664199
Atom stsc @ 1664199 of size: 28, ends @ 1664227
Atom stsz @ 1664227 of size: 2228, ends @ 1666455
Atom stco @ 1666455 of size: 2224, ends @ 1668679
Thanks FE
UPDATE:
I found a way to solve the problem: by watching the code of AtomicParsley I see that it's possible to get the codec informations about the stream atom (mp4a), reading the 11th Byte into the esds (Elementary Stream Description) atom.
Now I'm working in this way:
if the value of the 11th Byte is 0x40 I assume the stream is AAC, else if I read 0x69 I assume that the stream is MP3.
I don't like these "empirics" solutions so I'm looking for a more correct way, but I found onlyUnderstanding_AAC that is not complete.
Anyone know where I can get a more detailed specification of MP4 containers?
In the 'esds' atom there are a few fields relevant to determining the codec. The first byte of content of the esds atom is the objectTypeIndication
(that's the 11th byte from your solution). This field is supposed to indicate the codec used, but there are a few entries used by multiple codecs. MP4RA has a full list of codec values. Here are few that are relevant in this case:
0x6B
and 0x69
denote MPEG-1 and 2 respectively layers 1, 2, and 3. 0x67
denotes MPEG-2 AAC LC but generally is unused in favor of 0x040
(0x66
and 0x68
are also MPEG-2 AAC profiles are seen even less frequently). 0x40
denotes MPEG-4 Audio. MPEG-4 Audio generally is thought of as AAC but there is a whole framework of audio codecs that can go in MPEG-4 Audio including AAC, BSAC, ALS, CELP, and something called MP3On4. MP3On4 is an MP3 variant with some new header information for multichannel.
We can figure out what audio format is actually in the MPEG-4 Audio by looking at the the AudioSpecificConfig
. This is the global header for the decoder that exists at byte 13 of the content of the 'esds' atom. At the beginning of the AudioSpecificConfig
there is a 5-bit AudioObjectType
. A full list can be found on the multimedia wiki (that was linked in your post under the 'MPEG-4 Audio' article: http://wiki.multimedia.cx/index.php?title=MPEG-4_Audio but here are the useful values:
If you aren't worried about 'MP3On4' mp3 variant nor the other weird MPEG-4 Audio codecs then just use the objectTypeIndication
.
In the MPEG specifications these details are spread across 14496-1, -12, -14, and -3. Of these only 14496-12 is freely available: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html
The format of the esds atom [1] is defined as:
Size 32-bit
Type 32-bit 'esds'
Version: 8-bit, zero.
Flags: 24-bit field, zero.
Elementary Stream Descriptor
The Elementary Stream Descriptor is defined in the relevant MPEG4 documents [2].
Looking at a typical ESDS from MP4A file:
00000033 65736473 00000000 03808080
22000100 04808080 14401500 00000001
FC170001 FC170580 80800212 08068080
800102
Intepret as
00000033 65736473 = ISO Atom "esds" of length 0x33
00000000 = Version/Flags field (0), meaning tagged Elementary Stream Descriptor follows
03808080 = TAG(3) = Object Descriptor ([2])
22 = length of this OD (which includes the next 2 tags)
0001 = ES_ID = 1
00 = flags etc = 0
04808080 = TAG(4) = ES Descriptor ([2]) embedded in above OD
14 = length of this ESD
40 = MPEG4 Audio (see table for valid types here)
15 = stream type(6bits)=5 audio, flags(2bits)=1
000000 = 24bit buffer size
0001FC17 = max bitrate (130,071 bps)
0001FC17 = avg bitrate
05808080 = TAG(5) = ASC ([2],[3]) embedded in above OD
02 = length
1208 = ASC (AOT=2 AAC-LC, freq=4 => 44100 Hz, chan=1 => single channel, flen0 => 1024 samples)
06808080 = TAG(6)
01 = length
02 = data
Refs:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With