I am trying to understand the working of video codec in general, one of them being H.264.
I have gone through some articles on the web about the working of H.264 and I hope to have got some understanding of the same.
While reading about the same I got to know about the different types of frames such as I-Frame, P-Frame and B-Frame which are being used when streaming a video encoded using H.264.
What I could not understand is given a raw video which obviously have data present in it in terms of frame, literature on the web says that I-Frame is the original frame as it is where as P-Frame is predicted from the previous I-Frame or P-Frame, here how could a P-Frame is predicted from another P-Frame when the same doesn't exist yet.
Also what confuses me is the prediction of P-Frame.
Kindly help to understand the same or refer a literature which is explaining the same.
Considering your last comment about B-frames. The video encoder has a buffer which keeps a certain amount of frames.
Let's consider an example where your first 4 frames are to be encoded with following structure: IBBP.
The first frame is encoded as an I-frame (intra). Frame 2 and 3 are B frames and cannot be encoded right away because they are waiting for the "future" frame P. So they are put in the buffer. When frame 4 arrives at the encoder, it is inter-encoded (unidirectional), with the first frame (I-frame) as a reference. Now that the P-frame has been encoded, the frames 2 and 3 which are bidirectional, and thus need references in the past and in the future, can be encoded (inter, bidirectional).
So the encoding order is: I P B B. Which is not the same order as the display order.
As you can see, B-frames introduce delays in the encoding process, and are thus generally not used in low-delay applications such as videoconferencing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With