Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can HTML5 video's byte-range requests (pseudo-streaming) work?

If you play an HTML5 video for a video that is hosted on a server that accepts range requests, then when you try to seek ahead to a non-buffered part of the video you'll notice from the network traffic that the browser makes a byte range-request. I'm assuming that the browser computes the byte by knowing the total video size ahead of time and assuming a constant bitrate (if you click half-way in the progress bar, then it will request the byte at the half-way point). But especially if the video is variable bitrate, it seems unlikely that the byte it requests could really correspond to the time-point that the user clicked on, and the byte would likely fall in the middle of a frame.

How does the browser know what the beginning of the next frame is, once it's begun fetching at some arbitrary byte?

like image 238
bhh1988 Avatar asked Aug 12 '13 01:08

bhh1988


2 Answers

I assume your video is in an Mp4 container. The mp4 file format contains a hierarchical structure of 'boxes'. One of these boxes is the Time-To-Sample (stts) box. This box contains the time of every frame (in a compact fashion). From here you can find the 'chunk' that contains the frame using the Sample-to-Chunk (stsc) atom. And finally the Chunk offset atom (stco) gives you the byte offset into the file.

The total duration of the movie is store in the Movie header atom (mvhd). When you move the scrub handle, a time is estimated based on the duration of the movie and where you let go of the scrub handle, a calculation is made from the the file header downloaded previously, and a request is made.

Edit: If it is not mp4, other containers have similar mechanism. Codec is irrelevant.

like image 87
szatmary Avatar answered Oct 17 '22 17:10

szatmary


Many video/media types, such as MPEG, are encoded in fixed-same packets.

MPEG was originally designed on 188-byte packets (originally chosen to be 8 cells of the ATM transport layer, though that is now obsolete). So if you seek to a multiple of that 188-byte size, the player will read valid packets & recover sync when it finds the beginning of a frame.

Actual picture can be displayed, when the browser/player reaches an I-frame (or keyframe) which can be decoded independently of any other frames. P- and B-frames are interpolations, so if you seek to them you can't yet construct a picture.

See:

  • http://en.wikipedia.org/wiki/MPEG_transport_stream

  • http://en.wikipedia.org/wiki/MPEG-1#Frame.2Fpicture.2Fblock_types

like image 1
Thomas W Avatar answered Oct 17 '22 16:10

Thomas W