Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MPEG1 motion estimation / compensation

Tags:

mpeg

I seen the following explanantion for motion estimation / compensation for MPEG 1 and was just wondering is it correct:

Why dont we just code the raw difference between the current block and the reference block? Because the numbers for the residual are usually going to be a lot smaller. For example, say an object accelerates across the image. The x position in 11 frames was the following numbers. 12 16 20 25 31 38 48 59 72 84 96 The raw differences would be x 4 4 5 6 7 10 11 13 12 12 So the predicted values would be x x 20 24 30 37 45 58 70 85 96 So the residuals are x x 0 1 1 1 3 1 2 -1 0

Is the prediction for frame[i+1] = (frame[i] - frame[i-1]) + frame[i] i.e add the motion vector of previous two reference frames to the most recent reference frame? Then we encode the prediction residual, which is actual captured shot of frame[i+1] - prediction frame[i+1] and send this to the decoder?

like image 321
user915071 Avatar asked Jan 21 '12 14:01

user915071


1 Answers

MPEG1 decoding (motion compensation) works like this:

The predictions and motion vectors turn a reference frame into the next (current) frame. Here's how you would calculate each pixel of the new frame:

For each macroblock, you have a set of predicted values (differences from reference frame). The motion vector is a value relative to the reference frame.

// Each luma and chroma block are 8x8 pixels
    for(y=0; y<8; y++)
    {
       for (x=0; x<8; x++)
       {
          NewPixel(x,y) = Prediction(x,y) + RefPixel(x+motion_vector_x, y+motion_vector_y)
       }
    }

With MPEG1 you have I, P and B frames. I frames are completely intra coded (e.g. similar to JPEG), with no references to other frames. P frames are coded with predictions from the previous frame (either I or P). B frames are coded with predictions from both directions (previous and next frame). The B frame processing makes the video player a little more complicated because it may reference the next frame, therefore each frame has a sequence number and B frames will cause the sequence to be non-linear. In other words, your video decoder needs to hold on to potentially 3 frames while decoding a stream (previous, current and next).

like image 97
BitBank Avatar answered Sep 28 '22 00:09

BitBank