Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does H.264 or video encoders in general compute the residual image of two frames?

I have been trying to understand how video encoding works for modern encoders, in particular H264. It is very often mentioned in documentation that residual frames are created from the differences between the current p-frame and the last i-frame (assuming the following frames are not used in the prediction). I understand that a YUV color space is used (maybe YV12), and that one image is "substracted" from the other and then the residual is formed. What I don't understand is how exactly this substraction works. I don't think it is an absolute value of the difference because that would be ambiguous. What is the per pixel formula to obtain this difference?

like image 304
cloudraven Avatar asked Jul 06 '11 01:07

cloudraven


People also ask

How does h264 codec work?

A codec based on the H. 264 standard compresses a digital video file (or stream) so that it only requires half of the storage space (or network bandwidth) of MPEG-2. Through this compression, the codec is able to maintain the same video quality despite using only half of the storage space.

What is video encoding h264?

H. 264 or MPEG-4 AVC (Advanced Video Coding) is a video coding format for recording and distributing full HD video and audio. It was developed and maintained by the ITU-T Video Coding Experts Group (VCEG) with the ISO/IEC JTC1 Moving Picture Experts Group (MPEG).

How does a video encoder work?

In simple terms, encoding is the process of compressing and changing the format of raw video content to a digital file or format, which will in turn make the video content compatible for different devices and platforms. The main goal of encoding is to compress the content to take up less space.

What is the compression rate of H 264?

H. 264 High Profile is the most efficient and powerful profile in the H. 264 family, and is the primary profile for broadcast and disc storage, particularly for HDTV and Bluray disc storage formats. It can achieve a compression ratio of about 2000:1.


1 Answers

Subtraction is just one small step in video encoding; the core principle behind most modern video encoding is motion estimation, followed by motion compensation. Basically, the process of motion estimation generates vectors that show offsets between macroblocks in successive frames. However, there's always a bit of error in these vectors.

So what happens is the encoder will output both the vector offsets, and the "residual" is what's left. The residual is not simply the difference between two frames; it's the difference between the two frames after motion estimation is taken into account. See the "Motion compensated difference" image in the wikipedia article on compensation for a clear illustration of this--note that the motion compensated difference is drastically smaller than the "dumb" residual.

Here's a decent PDF that goes over some of the basics.

A few other notes:

  • Yes, YUV is always used, and typically most encoders work in YV12 or some other chroma subsampled format
  • Subtraction will have to happen on the Y, U and V frames separately (think of them as three separate channels, all of which need to be encoded--then it becomes pretty clear how subtraction has to happen). Motion estimation may or may not happen on Y, U and V planes; sometimes encoders only do it on the Y (the luminance) values to save a bit of CPU at the expense of quality.
like image 141
kidjan Avatar answered Oct 01 '22 12:10

kidjan