I want to understand how video and audio decoding works, specially the timing synchronization (how to get 30fps video, how to couple that with audio, etc.). I don't want to know ALL the details, just the essence of it. I want to be able to write a high level simplification of an actual video/audio decoder.
Could you provide pointers to me? An actual C/C++ source code of a MPEG2 video/audio decoder would be the fastest way to understand those things I think.
The MPEG-2 Video Decoder is a Media Foundation transform that decodes MPEG-1 and MPEG-2 video. The decoder supports MPEG-2 Simple and Main profile video (H. 262, ISO/IEC 13818-2) and MPEG-1 video (ISO/IEC 11172-2).
MPEG is a codec. There are several versions of it, called MPEG-1, MPEG-2, MPEG-4, ... When you play an MPEG video from a DVD, for instance, the MPEG stream is actually composed of several streams (called Elementary Streams, ES): there is one stream for video, one for audio, another for subtitles, and so on.
MPEG-2 is widely used as the format of digital television signals that are broadcast by terrestrial (over-the-air), cable, and direct broadcast satellite TV systems. It also specifies the format of movies and other programs that are distributed on DVD and similar discs.
In addition, the decoder uses the system timing as a reference for presentation and display timestamps, which are used to synchronize audio and video components of a single program transport stream.
For audio/video synchronization, basically, every video and audio frame should be time-stamped. The timestamp is typically known as PTS (Presentation Time Stamp). Once a video/audio is decoder by decoder, the audio/video renderer should schedule the frame to be displayed at the right time so that audio/video is synchronized.
I think you can refer to chapter "Timing Model" of MPEG2 Tutorial for details.
Reading source code from a codec that works seems the right way to go. I suggest the following :
http://www.mpeg.org/MPEG/video/mssg-free-mpeg-software.html
Given that it's mentionned on the mpeg.org website, i'd say you'll find what you need here.
In the past i've had some time to work on decoding mpeg videos (no audio though), and the principles are quite simple. There are some pure images included, some intermediary images that are described relatively to the closest main ones, and the rest are described using the closest main/intermediary images.
One time slot, one image. But recent codecs are much more complicated, I guess !
EDIT : synchronization
I am no expert in synchronizing audio and video, but the issue seems to be dealt with using a sync layer (see there for a definition).
You can browse source code of ffmpeg (available through svn), or its API documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With