How does H.264 or video encoders in general compute the residual image of two frames?

Tags:

I have been trying to understand how video encoding works for modern encoders, in particular H264. It is very often mentioned in documentation that residual frames are created from the differences between the current p-frame and the last i-frame (assuming the following frames are not used in the prediction). I understand that a YUV color space is used (maybe YV12), and that one image is "substracted" from the other and then the residual is formed. What I don't understand is how exactly this substraction works. I don't think it is an absolute value of the difference because that would be ambiguous. What is the per pixel formula to obtain this difference?

304

asked Jul 06 '11 01:07

cloudraven

1 Answers

Subtraction is just one small step in video encoding; the core principle behind most modern video encoding is motion estimation, followed by motion compensation. Basically, the process of motion estimation generates vectors that show offsets between macroblocks in successive frames. However, there's always a bit of error in these vectors.

So what happens is the encoder will output both the vector offsets, and the "residual" is what's left. The residual is not simply the difference between two frames; it's the difference between the two frames after motion estimation is taken into account. See the "Motion compensated difference" image in the wikipedia article on compensation for a clear illustration of this--note that the motion compensated difference is drastically smaller than the "dumb" residual.

Here's a decent PDF that goes over some of the basics.

A few other notes:

Yes, YUV is always used, and typically most encoders work in YV12 or some other chroma subsampled format
Subtraction will have to happen on the Y, U and V frames separately (think of them as three separate channels, all of which need to be encoded--then it becomes pretty clear how subtraction has to happen). Motion estimation may or may not happen on Y, U and V planes; sometimes encoders only do it on the Y (the luminance) values to save a bit of CPU at the expense of quality.

141

answered Oct 01 '22 12:10

kidjan

Related questions
                            
                                How can I tell if Heroku is sending gzipped responses?
                            
                                Chained GZipStream/DeflateStream and CryptoStream (AES) breaks when reading
                            
                                Compress small strings, with what to create external dictionary?
                            
                                Can zip files be sparse/non-contiguous?
                            
                                What data-structure/algorithm will allow me to send a list of key/value dictionaries using the least amount of bits?
                            
                                Variable length integer encoding
                            
                                mod_deflate or mod_gzip, which should be used?
                            
                                What is the fastest bzip2 decompressor?
                            
                                SharpZipLib : Compressing a single file to a single compressed file
                            
                                Compressing multiple JavaScript files with YUIcompressor?
                            
                                GZIP decompression C# OutOfMemory
                            
                                javascript - zip a Blob with zip.js
                            
                                SevenZipSharp - how to compress multiple directories into a single file using c#?
                            
                                Uncompress .Z file in unix using java
                            
                                Enabling gzip compression on Azure Websites
                            
                                How to mimic a "use_crc" flag on (hacked) 1998 uncompress() in 2013 zlib API interface?
                            
                                How to enable gzip compression in Simple OData Client?
                            
                                Zlib compress in python
                            
                                Disable GZIP compression for IE6 clients
                            
                                Decode byte array to bitmap that has been compressed in Java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does H.264 or video encoders in general compute the residual image of two frames?

Tags:

compression

h.264

video-encoding

cloudraven

People also ask

1 Answers

kidjan

Recent Activity

Donate For Us