Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

FFMPEG's xstack command results in out of sync sound, is it possible to mix the audio in a single encoding?

I wrote a python script that generates a xstack complex filter command. The video inputs is a mixture of several formats described here:

I have 2 commands generated, one for the xstack filter, and one for the audio mixing.

Here is the stack command: (sorry the text doesn't wrap!)

'c:/ydl/ffmpeg.exe',
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-filter_complex',
'[0]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf0];[rsclbf0]fps=24[rscl0];[1]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf1];[rsclbf1]fps=24[rscl1];[2]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf2];[rsclbf2]fps=24[rscl2];[3]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf3];[rsclbf3]fps=24[rscl3];[4]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf4];[rsclbf4]fps=24[rscl4];[5]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf5];[rsclbf5]fps=24[rscl5];[6]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf6];[rsclbf6]fps=24[rscl6];[7]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf7];[rsclbf7]fps=24[rscl7];[8]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf8];[rsclbf8]fps=24[rscl8];[9]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf9];[rsclbf9]fps=24[rscl9];[10]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf10];[rsclbf10]fps=24[rscl10];[11]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf11];[rsclbf11]fps=24[rscl11];[12]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf12];[rsclbf12]fps=24[rscl12];[13]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf13];[rsclbf13]fps=24[rscl13];[14]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf14];[rsclbf14]fps=24[rscl14];[rscl0][rscl1][rscl2][rscl3][rscl4]concat=n=5[cct0];[rscl5][rscl6][rscl7]concat=n=3[cct1];[rscl8][rscl9][rscl10]concat=n=3[cct2];[rscl11][rscl12][rscl13][rscl14]concat=n=4[cct3];[cct0][cct1][cct2][cct3]xstack=inputs=4:layout=0_0|w0_0|0_h0|w0_h0',
'output.mp4',

Here is the mix_audio command:

'c:/ydl/ffmpeg.exe',
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-filter_complex',
'[0:a][1:a][2:a][3:a][4:a]concat=n=5:v=0:a=1[cct_a0];[5:a][6:a][7:a]concat=n=3:v=0:a=1[cct_a1];[8:a][9:a][10:a]concat=n=3:v=0:a=1[cct_a2];[11:a][12:a][13:a][14:a]concat=n=4:v=0:a=1[cct_a3];[cct_a0][cct_a1][cct_a2][cct_a3]amix=inputs=4[all_aud]',
'-map',
'15:v',
'-map',
'[all_aud]',
'-c:v',
'copy',
'output.mp4',

Of course those are sample commands, I actually use many more videos as input, this sample is shorter for the sake or readability.

Here are the videos I use, with relevant ffprobe data, in some HTML table:

ffprobe data

I'm getting this warning:

[swscaler @ 0000020bac5a19c0] Warning: data is not aligned! This can lead to a speed loss

I think this is unrelated to audio desyncing this unaligned data is about x264 resolutions being multiple of 16, but my filter takes this into account already.

There is a perceptible audio desyncing, which is the main problem I am having. FFMPEG doesn't seem to get other errors. Is it because I use 2 commands to mix the audio after? How could I proceed to to the xstack stage and the audio mixing in a single stage?

I'm a bit confused as how FFMPEG handles diverse framerates. I was told to reencode all the video inputs before performing the xstack stage, but I would create some disk overhead, so I'd rather do it in a single ffmpeg job it possible.

like image 338
jokoon Avatar asked Nov 18 '21 13:11

jokoon


People also ask

Does FFmpeg work with concat and encoder?

Using concat and encoding the video seems to work fine. The audio stays in sync as ffmpeg does the full conversion calculations and seems to get everything right. However, simply concatenating the videos without any transformation or encoding results in a slowly increasing sync issue.

Why does FFmpeg take so long to sync?

The audio stays in sync as ffmpeg does the full conversion calculations and seems to get everything right. However, simply concatenating the videos without any transformation or encoding results in a slowly increasing sync issue.

How does compression affect Avi sync?

The whole point of compression is to reduce the filesize and keep some quality PDR, it's not the B-frames that cause sync issues in AVI but B-pyramid.


1 Answers

I'm a bit confused as how FFMPEG handles diverse framerates

It doesn't, which would cause a misalignment in your case. The vast majority of filters (any which deal with multiple sources and make use of frames, essentially), including the Concatenate filter require that be the sources have the same framerate.

For the concat filter to work, the inputs have to be of the same frame dimensions (e.g., 1920⨉1080 pixels) and should have the same framerate.

(emphasis added)

The documentation also adds:

Therefore, you may at least have to add a ​scale or ​scale2ref filter before concatenating videos. A handful of other attributes have to match as well, like the stream aspect ratio. Refer to the documentation of the filter for more info.

You should convert your sources to the same framerate first.

like image 178
sytech Avatar answered Oct 20 '22 08:10

sytech