Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

FFMPEG amix filter volume issue with inputs of different duration

Tags:

ffmpeg

audio

I noticed that ffmpeg amix filter doesn't output good result in specific situation. It works fine if input files have equal duration. In that case volume is dropped in constant value and could be fixed with ",volume=2".

In my case I'm using files with different duration. Resulted volume is not good. First mixed stream resulted in lowest volume, and last one is highest. You can see on image that volume is increased linearly withing a time.

enter image description here

My command:

ffmpeg -i temp_0.mp4 -i user_2123_10.mp4  -i user_2123_3.mp4  -i user_2123_4.mp4  
-i user_2123_7.mp4  -i user_2123_5.mp4  -i user_2123_1.mp4  -i user_2123_8.mp4  
-i user_2123_0.mp4  -i user_2123_6.mp4  -i user_2123_9.mp4  -i user_2123_2.mp4  
-i user_2123_11.mp4 -filter_complex "[1:a]adelay=34741.0[aud1];
[2:a]adelay=18241.0[aud2];[3:a]adelay=20602.0[aud3];
[4:a]adelay=27852.0[aud4];[5:a]adelay=22941.0[aud5];
[6:a]adelay=13142.0[aud6];[7:a]adelay=29810.0[aud7];
[8:a]adelay=12.0[aud8];[9:a]adelay=25692.0[aud9];
[10:a]adelay=32143.002[aud10];[11:a]adelay=16101.0[aud11];
[12:a]adelay=40848.0[aud12];
[0:a][aud1][aud2][aud3][aud4][aud5][aud6][aud7]
[aud8][aud9][aud10][aud11]
[aud12]amix=inputs=13:duration=first:dropout_transition=0" 
-vcodec copy -y temp_1.mp4

That could be fixed by applying silence at the beginning and end of each clip, then they will have same duration and volume will be at the same level.

Please suggest how I can use amix to mix many inputs and ensure constant volume level.

like image 476
Stan Reshetnyk Avatar asked Feb 19 '16 15:02

Stan Reshetnyk


8 Answers

amix scales each input's volume by 1/n where n = no. of active inputs. This is evaluated for each audio frame. So when an input drops out, the volume of the remaining inputs is scaled by a smaller amount, hence their volumes increase.

Changing the dropout_transition for all earlier inputs, as suggested in other answers, is one approach, but I think it will result in coarse volume modulations. Better method is to normalize the audio after the amix.

At present, you have two options, the loudnorm or the dynaudnorm filter. The latter is much faster

Syntax is to add it after the amix, so

[aud11][aud12]amix=inputs=13:duration=first:dropout_transition=0,dynaudnorm"

Read the documentation, if you wish to tweak parameters for maximum volume or RMS mode normalization..etc

like image 174
Gyan Avatar answered Sep 18 '22 08:09

Gyan


The latest version of FFMPEG includes the normalize parameter for the amix filter, which you can use to turn off the constantly changing normalization. Here's the documentation for it.

Your amix filter string can be changed to:

[aud12]amix=inputs=13:normalize=0
like image 44
emich Avatar answered Sep 22 '22 08:09

emich


The solution seems to be a combination of "pre-amp", or multiplication, as Maxim puts it, AND you have to set dropout_transition >= max delay + max input length (or a very high number):

amix=inputs=13:dropout_transition=1000,volume=13

Notes:

  • amix has to resample float anyway, so there is no downside with adding the volume filter (which by default resamples to float, too).
    And since we're using floats, there's no clipping and (almost) no loss of precision.
  • H't to @Mulvya for the analysis but their solution is frustratingly non-mathematical
  • I was originally trying to do this with sox, which was too slow. Sox's remix filter has the -m switch which disables the 1/n adjustment.
  • While faster, ffmpeg seems to be using way more memory for the same task. YMMV - I didn't test this thoroughly, because I finally settled on a small python script which uses pydub's overlay function, and only keeps the final output file and one segment in memory (whereas ffmpeg and sox seem to keep all of the segments in memory).
like image 40
kubi Avatar answered Sep 18 '22 08:09

kubi


The solution I've found is to specify the volume for each track in a "descendant" order and use no normalization filter afterwards.

I use this example, where I concat the same audio file in different positions:

ffmpeg -vn -i test.mp3 -i test.mp3 -i test.mp3 -filter_complex "[0]adelay=0|0,volume=3[a];[1]adelay=2000|2000,volume=2[b];[2]adelay=4000|4000,volume=1[c];[a][b][c]amix=inputs=3:dropout_transition=0" -q:a 1 -acodec libmp3lame -y amix-volume.mp3

More details, see this image. The first track is the normal mixing, the second is the one with volumes specified; the third is the original track. As we can see the 2nd track looks to have a normal volume.

enter image description here

ffmpeg -vn -i test.mp3 -i test.mp3 -i test.mp3 -filter_complex "[0]adelay=0|0[a];[1]adelay=2000|2000[b];[2]adelay=4000|4000[c];[a][b][c]amix=inputs=3:dropout_transition=0" -q:a 1 -acodec libmp3lame -y amix-no-volume.mp3

ffmpeg -vn -i test.mp3 -i test.mp3 -i test.mp3 -filter_complex "[0]adelay=0|0,volume=3[a];[1]adelay=2000|2000,volume=2[b];[2]adelay=4000|4000,volume=1[c];[a][b][c]amix=inputs=3:dropout_transition=0" -q:a 1 -acodec libmp3lame -y amix-volume.mp3

I can't really understand why amix changes the volume; anyway; I was digging around since a while for a good solution.

like image 39
klodoma Avatar answered Sep 21 '22 08:09

klodoma


I got the same problem but found a solution!

First the Problem: i had to mix a background music file with 3 different TTS voice pieces that start with different delay. At the end the background sound was extremely loud.

I tried the suggested answer but it did not work for me, the end volume was still much higher. So my thoughts were: "All inputs must have the same length so everytime the same amount of audio is active in the mix"

apad on all TTS inputs with whole_len set and -shortest option in combination did the work for me.

Example call:

ffmpeg -y 
       -nostats 
       -hide_banner 
       -v quiet 
       -hwaccel auto 
       -f image2pipe 
       -i pipe:0 
       -i bgAudio.aac 
       -i TTS1.mp3 
       -i TTS2.mp3 
       -i TTS3.mp3 
       -filter_complex [1:a]loudnorm=I=-16:TP=-1.5:LRA=11:linear=false[a0];[2:a]loudnorm=I=-16:TP=-1.5:LRA=11:linear=false:dual_mono=true,adelay=7680|7680,apad=whole_len=2346240[a1];[3:a]loudnorm=I=-16:TP=-1.5:LRA=11:linear=false:dual_mono=true,adelay=14640|14640,apad=whole_len=2346240[a2];[4:a]loudnorm=I=-16:TP=-1.5:LRA=11:linear=false:dual_mono=true,adelay=3240|3240,apad=whole_len=2346240[a3];[a0][a1][a2][a3]amix=inputs=4:dropout_transition=0,asplit=6[audio0][audio1][audio2][audio3][audio4][audio5];[0:v]format=yuv420p,split=6[1080p][720p][480p][360p][240p][144p] 
       -map [audio0] -map [1080p] -s 1920x1080 -shortest out1080p.mp4 
       -map [audio1] -map [720p] -s 1280x720 -shortest out720p.mp4 
       -map [audio2] -map [480p] -s 858x480 -shortest out480p.mp4 
       -map [audio3] -map [360p] -s 640x360 -shortest out360p.mp4 
       -map [audio4] -map [240p] -s 426x240 -shortest out240p.mp4 
       -map [audio5] -map [144p] -s 256x144 -shortest out144p.mp4

Hope someone helps this!

like image 22
Maik Laschober Avatar answered Sep 22 '22 08:09

Maik Laschober


Try to use multiplication:

"amix=inputs="+ chunks.length + ":duration=first:dropout_transition=3,volume=" + chunks.length
like image 29
Maxim Firsoff Avatar answered Sep 21 '22 08:09

Maxim Firsoff


Sorry, for not sending ffmpeg output.

After all we ended up by writing small util in C++ for mixing audio. But first we converted mp4 to raw(pcm) format. That worked just fine for us, even requires addition HDD space for raw intermediate files.

Code looks like this:

short addSounds(short a, short b) {
    double da = a;
    da /= 65536.0;
    da += 0.5;
    double db = b;
    db /= 65536.0;
    db += 0.5;
    double z = 0;
    if (da < 0.5 && db < 0.5) {
        z = 2 * da*db;
    }
    else {
        z = 2 * ( da + db ) - 2 * da* db - 1;
    }
    z -= 0.5;
    z *= 65536.0;
    return (short)z;
}
like image 28
Stan Reshetnyk Avatar answered Sep 22 '22 08:09

Stan Reshetnyk


I will show you my code.

"amix="+inputs.size()+",volume="+(inputs.size()+1)/2+"[mixout]\""

I don't use the code dropout_transition=0 because it will cause the problem you meet.

but I also find the problem that volume will be lower as the size of inputs increases.

so I make the volume louder.

like image 44
戴文锦 Avatar answered Sep 20 '22 08:09

戴文锦