Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Modifying motion vectors in ffmpeg H.264 decoder

For research purposes, I am trying to modify H.264 motion vectors (MVs) for each P- and B-frame prior to motion compensation during the decoding process. I am using FFmpeg for this purpose. An example of a modification is replacing each MV with its original spatial neighbors and then using the resultant MVs for motion compensation, rather than the original ones. Please direct me appropriately.

So far, I have been able to do a simple modification of MVs in the file /libavcodec/h264_cavlc.c. In the function, ff_h264_decode_mb_cavlc(), modifying the mx and my variables, for instance, by increasing their values modifies the MVs used during decoding.

For example, as shown below, the mx and my values are increased by 50, thus lengthening the MVs used in the decoder.

mx += get_se_golomb(&s->gb)+50;
my += get_se_golomb(&s->gb)+50;

However, in this regard, I don't know how to access the neighbors of mx and my for my spatial mean analysis that I mentioned in the first paragraph. I believe that the key to doing so lies in manipulating the array, mv_cache.

Another experiment that I performed was in the file, libavcodec/error_resilience.c. Based on the guess_mv() function, I created a new function, mean_mv() that is executed in ff_er_frame_end() within the first if-statement. That first if-statement exits the function ff_er_frame_end() if one of the conditions is a zero error-count (s->error_count == 0). However, I decided to insert my mean_mv() function at this point so that is always executed when there is a zero error-count. This experiment somewhat yielded the results I wanted as I could start seeing artifacts in the top portions of the video but they were restricted just to the upper-right corner. I'm guessing that my inserted function is not being completed so as to meet playback deadlines or something.

Below is the modified if-statement. The only addition is my function, mean_mv(s).

if(!s->error_recognition || s->error_count==0 || s->avctx->lowres ||
       s->avctx->hwaccel ||
       s->avctx->codec->capabilities&CODEC_CAP_HWACCEL_VDPAU ||
       s->picture_structure != PICT_FRAME || // we dont support ER of field pictures yet, though it should not crash if enabled
       s->error_count==3*s->mb_width*(s->avctx->skip_top + s->avctx->skip_bottom)) {
        //av_log(s->avctx, AV_LOG_DEBUG, "ff_er_frame_end in er.c\n"); //KG
        if(s->pict_type==AV_PICTURE_TYPE_P)
            mean_mv(s);
        return;

And here's the mean_mv() function I created based on guess_mv().

static void mean_mv(MpegEncContext *s){
    //uint8_t fixed[s->mb_stride * s->mb_height];
    //const int mb_stride = s->mb_stride;
    const int mb_width = s->mb_width;
    const int mb_height= s->mb_height;
    int mb_x, mb_y, mot_step, mot_stride;

    //av_log(s->avctx, AV_LOG_DEBUG, "mean_mv\n"); //KG

    set_mv_strides(s, &mot_step, &mot_stride);

    for(mb_y=0; mb_y<s->mb_height; mb_y++){
        for(mb_x=0; mb_x<s->mb_width; mb_x++){
            const int mb_xy= mb_x + mb_y*s->mb_stride;
            const int mot_index= (mb_x + mb_y*mot_stride) * mot_step;
            int mv_predictor[4][2]={{0}};
            int ref[4]={0};
            int pred_count=0;
            int m, n;

            if(IS_INTRA(s->current_picture.f.mb_type[mb_xy])) continue;
            //if(!(s->error_status_table[mb_xy]&MV_ERROR)){
            //if (1){
            if(mb_x>0){
                mv_predictor[pred_count][0]= s->current_picture.f.motion_val[0][mot_index - mot_step][0];
                mv_predictor[pred_count][1]= s->current_picture.f.motion_val[0][mot_index - mot_step][1];
                ref         [pred_count]   = s->current_picture.f.ref_index[0][4*(mb_xy-1)];
                pred_count++;
            }

            if(mb_x+1<mb_width){
                mv_predictor[pred_count][0]= s->current_picture.f.motion_val[0][mot_index + mot_step][0];
                mv_predictor[pred_count][1]= s->current_picture.f.motion_val[0][mot_index + mot_step][1];
                ref         [pred_count]   = s->current_picture.f.ref_index[0][4*(mb_xy+1)];
                pred_count++;
            }

            if(mb_y>0){
                mv_predictor[pred_count][0]= s->current_picture.f.motion_val[0][mot_index - mot_stride*mot_step][0];
                mv_predictor[pred_count][1]= s->current_picture.f.motion_val[0][mot_index - mot_stride*mot_step][1];
                ref         [pred_count]   = s->current_picture.f.ref_index[0][4*(mb_xy-s->mb_stride)];
                pred_count++;
            }

            if(mb_y+1<mb_height){
                mv_predictor[pred_count][0]= s->current_picture.f.motion_val[0][mot_index + mot_stride*mot_step][0];
                mv_predictor[pred_count][1]= s->current_picture.f.motion_val[0][mot_index + mot_stride*mot_step][1];
                ref         [pred_count]   = s->current_picture.f.ref_index[0][4*(mb_xy+s->mb_stride)];
                pred_count++;
            }

            if(pred_count==0) continue;

            if(pred_count>=1){
                int sum_x=0, sum_y=0, sum_r=0;
                int k;

                for(k=0; k<pred_count; k++){
                    sum_x+= mv_predictor[k][0]; // Sum all the MVx from MVs avail. for EC
                    sum_y+= mv_predictor[k][1]; // Sum all the MVy from MVs avail. for EC
                    sum_r+= ref[k];
                    // if(k && ref[k] != ref[k-1])
                    // goto skip_mean_and_median;
                }

                mv_predictor[pred_count][0] = sum_x/k;
                mv_predictor[pred_count][1] = sum_y/k;
                ref         [pred_count]    = sum_r/k;
            }

            s->mv[0][0][0] = mv_predictor[pred_count][0];
            s->mv[0][0][1] = mv_predictor[pred_count][1];

            for(m=0; m<mot_step; m++){
                for(n=0; n<mot_step; n++){
                    s->current_picture.f.motion_val[0][mot_index + m + n * mot_stride][0] = s->mv[0][0][0];
                    s->current_picture.f.motion_val[0][mot_index + m + n * mot_stride][1] = s->mv[0][0][1];
                }
            }

            decode_mb(s, ref[pred_count]);

            //}
        }
    }
}

I would really appreciate some assistance on how to go about this properly.

like image 403
qontranami Avatar asked Feb 14 '12 00:02

qontranami


1 Answers

It's been a long time i have been out of touch with FFMPEG's code internally.

However, given my experience with inside FFMPEG horrors (you would know what i mean), i would rather give you a simple pragmatic advice.

Suggestion #1
Best possibility is that when motion vector of each of the blocks are identified - you can create your own additional array inside FFMPEG encoder context (a.k.a s) which will store all of them. When your algorithm runs it will pick up the values from there.

Suggestion #2
Another thing i read (i am not sure if i read it right)

the mx and my values are increased by 50

I think 50 is a very large motion vector. And usually, the F-value range of motion vector encoding would be prior restrictive. If you alter things by +/- 8 (or even +/- 16) might just be ok- but +50 could be so high that end result may not encode things properly.

I didn't quite understood your objective about mean_mv() and what failure you expect from there. Please re-phrase a bit.

like image 139
Dipan Mehta Avatar answered Oct 13 '22 15:10

Dipan Mehta