I have been trying to understand how video encoding works for modern encoders, in particular H264. It is very often mentioned in documentation that residual frames are created from the differences between the current p-frame and the last i-frame (assuming the following frames are not used in the prediction). I understand that a YUV color space is used (maybe YV12), and that one image is "substracted" from the other and then the residual is formed. What I don't understand is how exactly this substraction works. I don't think it is an absolute value of the difference because that would be ambiguous. What is the per pixel formula to obtain this difference?
A codec based on the H. 264 standard compresses a digital video file (or stream) so that it only requires half of the storage space (or network bandwidth) of MPEG-2. Through this compression, the codec is able to maintain the same video quality despite using only half of the storage space.
H. 264 or MPEG-4 AVC (Advanced Video Coding) is a video coding format for recording and distributing full HD video and audio. It was developed and maintained by the ITU-T Video Coding Experts Group (VCEG) with the ISO/IEC JTC1 Moving Picture Experts Group (MPEG).
In simple terms, encoding is the process of compressing and changing the format of raw video content to a digital file or format, which will in turn make the video content compatible for different devices and platforms. The main goal of encoding is to compress the content to take up less space.
H. 264 High Profile is the most efficient and powerful profile in the H. 264 family, and is the primary profile for broadcast and disc storage, particularly for HDTV and Bluray disc storage formats. It can achieve a compression ratio of about 2000:1.
Subtraction is just one small step in video encoding; the core principle behind most modern video encoding is motion estimation, followed by motion compensation. Basically, the process of motion estimation generates vectors that show offsets between macroblocks in successive frames. However, there's always a bit of error in these vectors.
So what happens is the encoder will output both the vector offsets, and the "residual" is what's left. The residual is not simply the difference between two frames; it's the difference between the two frames after motion estimation is taken into account. See the "Motion compensated difference" image in the wikipedia article on compensation for a clear illustration of this--note that the motion compensated difference is drastically smaller than the "dumb" residual.
Here's a decent PDF that goes over some of the basics.
A few other notes:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With