Let's say I have an audio file being decoded with ffmpeg
. The source format is something like AAC
where the audio is split into packets. When seeking to a particular time, it is clear that the time will not fall, most of the time, on the packet border but somewhere within the packet duration. Do I have to seek within packet myself or av_seek_frame does it all by itself and sets up decoding so that the next decoded frame should start at the position I've requested?
If I use the function av_seek_frame with the flag AVSEEK_FLAG_BACKWARD
, I assume that the next packet returned by av_read_frame will be the packet containing the time position I am seeking to. Is that right?
If I decode this packet with avcodec_decode_audio4, will the frame returned contain the audio data at the start time of the packet begining or from the time I've passed to av_seek_frame? In the latter case how can I find out the frame/packet timestamp so as to estimate the number of samples to skip in the decoded frame? The PTS
after seek is zero and DTS
looks useless either.
Is it possible to seek with precision to a particular time using ffmpeg
?
Time unit syntax Note that you can use two different time unit formats: sexagesimal ( HOURS:MM:SS. MILLISECONDS , as in 01:23:45.678 ), or in seconds.
The itsoffset option applies to all streams embedded within the input file. Therefore, adjusting timestamps only for a single stream requires to specify twice the same input file. adjusts timestamps of the input audio stream(s) only.
There is no frame-exact or audio-sample-exact seeking in ffmpeg, that's an application-level problem. The reason is quite simple: libavformat does the seeking, and it doesn't know what's inside the packets that individual demuxers return. It just has a blob of data with timestamp X and duration Y. It doesn't know anything about that data, you'd have to decode the data to do anything meaningful with it, which is libavcodec, not libvformat.
So, to answer your questions: av_seek_frame seeks to packet boundaries, AVSEEK_FLAG_BACKWARD means the packet will be strictly before the given ts; for audio, that means that the packet will most likely contain your timestamp. However, this is not always the case, because some demuxers seek based on an index, and not each packet may have an index entry. You may have to call av_read_frame() several times before you get to the packet that contains your specified timestamp after the seek.
Other than you calling avcodec_flush(), libavcodec doesn't know anything about seeking, so the output of the next call to avcodec_decode_audio4 will start at the start of the input packet. For sample-specific seeking, applications have to chop off leading samples themselves.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With