I'm trying to figure out how FFmpeg saves data in an AVFrame
after the audio has been decoded.
Basically, if I print the data in the AVFrame->data[]
array I get a number of unsigned 8 bit integers that is the audio in raw format.
From what I can understand from the FFmpeg doxygen, the format of the data is expressed in the enum AVSampleFormat
and there are 2 main categories: interleaved and planar. In the interleaved type, the data is all kept in the first row of the AVFrame->data
array with size AVFrame->linesize[0]
while in the planar type each channel of the audio file is kept in a separate row of the AVFrame->data
array and the arrays have as size AVFrame->linesize[0]
.
Is there a guide/tutorial that explains what do the numbers in the array mean for each of the formats?
Values in each of the data
arrays (planes) are actual audio samples according to specified format. E.g. if format is AV_SAMPLE_FMT_S16P
it means that data
arrays actually are arrays of int16_t
PCM data. If we have deal with mono signal - only data[0]
is valid, if it is stereo - data[0]
and data[1]
are valid, so on.
I'm not sure that there is any guide that can help you to explain each particular case but anyway described approach is quite simple and is easy to understand. You should just play a bit with it and thing should become clear.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With