Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding FFMPEG Video Encoding

Got this from the encoding example in ffmpeg. I can somewhat follow the authors example for audio encoding, but I find myself befuddled looking at the C code (I commented in block numbers to help me reference what I'm talking about)...

static void video_encode_example(const char *filename)
{
AVCodec *codec;
AVCodecContext *c= NULL;
int i, out_size, size, x, y, outbuf_size;
FILE *f;
AVFrame *picture;
uint8_t *outbuf, *picture_buf;              //BLOCK ONE
printf("Video encoding\n");

/* find the mpeg1 video encoder */
codec = avcodec_find_encoder(CODEC_ID_MPEG1VIDEO);
if (!codec) {
    fprintf(stderr, "codec not found\n");
    exit(1);                                //BLOCK TWO
}

c= avcodec_alloc_context();
picture= avcodec_alloc_frame();
/* put sample parameters */
c->bit_rate = 400000;
/* resolution must be a multiple of two */
c->width = 352;
c->height = 288;
/* frames per second */
c->time_base= (AVRational){1,25};
c->gop_size = 10; /* emit one intra frame every ten frames */
c->max_b_frames=1;
c->pix_fmt = PIX_FMT_YUV420P;                   //BLOCK THREE

/* open it */
if (avcodec_open(c, codec) < 0) {
    fprintf(stderr, "could not open codec\n");
    exit(1);
}
f = fopen(filename, "wb");
if (!f) {
    fprintf(stderr, "could not open %s\n", filename);
    exit(1);
}                                               //BLOCK FOUR

/* alloc image and output buffer */
outbuf_size = 100000;
outbuf = malloc(outbuf_size);
size = c->width * c->height;
picture_buf = malloc((size * 3) / 2); /* size for YUV 420 */
picture->data[0] = picture_buf;
picture->data[1] = picture->data[0] + size;
picture->data[2] = picture->data[1] + size / 4;
picture->linesize[0] = c->width;
picture->linesize[1] = c->width / 2;
picture->linesize[2] = c->width / 2;              //BLOCK FIVE

/* encode 1 second of video */
for(i=0;i<25;i++) {
    fflush(stdout);
    /* prepare a dummy image */
    /* Y */
    for(y=0;y<c->height;y++) {
        for(x=0;x<c->width;x++) {
            picture->data[0][y * picture->linesize[0] + x] = x + y + i * 3;
        }
    }                                            //BLOCK SIX

    /* Cb and Cr */
    for(y=0;y<c->height/2;y++) {
        for(x=0;x<c->width/2;x++) {
            picture->data[1][y * picture->linesize[1] + x] = 128 + y + i * 2;
            picture->data[2][y * picture->linesize[2] + x] = 64 + x + i * 5;
        }
    }                                           //BLOCK SEVEN

    /* encode the image */
    out_size = avcodec_encode_video(c, outbuf, outbuf_size, picture);
    printf("encoding frame %3d (size=%5d)\n", i, out_size);
    fwrite(outbuf, 1, out_size, f);
}                                              //BLOCK EIGHT

/* get the delayed frames */
for(; out_size; i++) {
    fflush(stdout);
    out_size = avcodec_encode_video(c, outbuf, outbuf_size, NULL);
    printf("write frame %3d (size=%5d)\n", i, out_size);
    fwrite(outbuf, 1, out_size, f);
}                                             //BLOCK NINE

/* add sequence end code to have a real mpeg file */
outbuf[0] = 0x00;
outbuf[1] = 0x00;
outbuf[2] = 0x01;
outbuf[3] = 0xb7;
fwrite(outbuf, 1, 4, f);
fclose(f);
free(picture_buf);
free(outbuf);
avcodec_close(c);
av_free(c);
av_free(picture);
}                                            //BLOCK TEN

Here's what I can get from the authors code block by block...

BLOCK ONE: Initializing Variables and pointers. I couldn't find the AVFrame struct yet in the ffmpeg source code so I don't know what its referencing

BLOCK TWO: Uses a codec from the file, if not found close.

BLOCK THREE: Sets sample video parameters. Only thing I don't really get is gop size. I read about intra frames and I still don't get what they are.

BLOCK FOUR: Open the file for writing...

BLOCK FIVE: Here's where they really start losing me. Part is probably because I don't know exactly what AVFrame is, but why do they only use 3/2 of the image size?

BLOCK SIX & SEVEN: I don't understand what they are trying to accomplish with this math.

BLOCK EIGHT: It looks like the avcodec function does all the work here, not concerned with that for the time being..

BLOCK NINE: Since it's outside the 25 frame for loop I assume it gets the leftover frames?

BLOCK TEN: Close, free mem, etc...

I know this is a large block of code to be confused with, any input would be helpful. I got put in over my head at work. Thanks in advance SO.

like image 700
SetSlapShot Avatar asked May 30 '12 20:05

SetSlapShot


2 Answers

As HonkyTonk already replied, the comments spell it out: prepare a dummy image. I'm guessing you might be confused about exactly how the dummy image is being generated, especially if you are unfamiliar with the YUV/YCbCr colorspace. Read the Wikipedia treatment for the basics.

Many video codecs operate in the YUV colorspace. This is often confusing to programmers who are only used to dealing in RGB. The executive summary is that, for this variation (YUV 4:2:0 planar), each pixel in the image gets a Y sample (note that the Y loop iterates over every (x,y) pair), while 2x2 pixel quads each share a U/Cb sample and a V/Cr sample (notice in block seven that the iteration is over width/2 and height/2).

It looks like the pattern generated is some kind of gradient. If you want to produce a known change, set Y/Cb/Cr to 0 and the dummy image will be all green. Set Cb and Cr to 128 and set Y to 255 and get a white frame; slide Y to 0 to see black; set Y to any value in between while holding Cb and Cr at 128 in order to see shades of gray.

like image 79
Multimedia Mike Avatar answered Oct 27 '22 00:10

Multimedia Mike


i share my understandings [Quiet a late reply!]

YUV420p:

YUV 420P or YCbCr , is alternative to RGB repersetation, and it contains 3

planes,namely Y (luma component) U (Y-Cb) & V (Y-Cr) components. [ans Y-Cb-Cr-Cg =

Constant,we don't need to store Cg component, as it can usually be computed.] Just like RGB888,which requires 3 bytes a pixel, YUV420 requires 1.5 bytes a pixel[@Find(How

the 12 bits are used for what comppnent in what ratio)] Here P -stands for Progressive,which means the frames are progressive,meaning V follows U, U follows Y and YUV Frame is a bytes array,simply!! Another is I -stands for interleaving,means UV planar data are interleaves in between Y plane data in a specific manner[@Find(What manner)]

like image 34
nmxprime Avatar answered Oct 27 '22 00:10

nmxprime