Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using SoX to change the volume level of a range of time in an audio file

Tags:

audio

sox

I’d like to change the volume level of a particular time range/slice in an audio file using SoX.

Right now, I’m having to:

  1. Trim the original file three times to get: the part before the audio effect change, the part during (where I’m changing the sound level), and the part after
  2. Perform the effect to change the sound level on the extracted “middle” chunk of audio, in its own file
  3. Splice everything back together, taking into account the fading/crossfading 5ms overlaps that SoX recommends

Is there a better way to do this that doesn’t involve writing a script to do the above?

like image 502
Edward Ocampo-Gooding Avatar asked Nov 21 '13 16:11

Edward Ocampo-Gooding


1 Answers

For anyone who stumbles across this highly ranked thread, searching for a way to duck the middle of an audio file:

I've been playing with SoX for ages and the method I built uses pipes to process each part without creating all those temporary files!

The result is a single line solution, though you will need to set timings and so, unless your fade timings will be the same for all files, it may be useful to generate the line with an algorithm.

I was pleased to get piping working, as I know this aspect has proved difficult for others. The command line options can be difficult to get right. However I really didn't like the messy additional files as an alternative.

By using mix functionality and positioning each part using pad, then giving each section trim & fade we can also avoid use of 'splice' here. I really wasn't a fan.


A working single line example, tested in SoX 14.4.2 Windows:

It fades (ducks) by -6dB at 2 seconds, returning to 0dB at 5 seconds (using linear fades of 0.4 seconds):

sox -m -t wav "|sox -V1 inputfile.wav -t wav - fade t 0 2.2 0.4" -t wav "|sox -V1 inputfile.wav -t wav - trim 1.8 fade t 0.4 3.4 0.4 gain -6 pad 1.8" -t wav "|sox -V1 inputfile.wav -t wav - trim 4.8 fade t 0.4 0 0 pad 4.8" outputfile.wav gain 9.542

Let's make that a little more readable here by breaking it down into sections:

Section 1 = full volume, Section 2 = ducked, Section 3 = full volume

sox -m
    -t wav "|sox -V1 inputfile.wav -t wav - fade t 0 2.2 0.4" 
    -t wav "|sox -V1 inputfile.wav -t wav - trim 1.8 fade t 0.4 3.4 0.4 gain -6 pad 1.8"
    -t wav "|sox -V1 inputfile.wav -t wav - trim 4.8 fade t 0.4 0 0 pad 4.8"
    outputfile.wav gain 9.542

Now, to break it down, very thoroughly

'-m' .. says we're going to mix (this automatically reduces gain, see last parameter)

'-t wav' .. says the piped command that follows will return a WAV (it seems the WAV header is being lost in the pipeline)

Then.. the FIRST piped part (full volume before duck)

'-V1' .. says ignore warnings - there will be a warning about not knowing length of output file for this specific section as it's piping out, but there should be no other warning from this operation

then the input filename

'-t wav' .. forces the output type

'-' .. is the standard name for a piped output which will return to SoX command line

'fade t 0 2.2 0.4' .. fades out the full volume section. t = linear. 0 fade in. Then (as we want the crossfade's halfway point to be at 2 seconds) we fade out by 2.2 seconds, with a 0.4 second fade (the fadeout parameter is for when the fade ENDS!)

'-t wav' .. to advise type of next part - as above

Then.. the SECOND piped part (the ducked section)

'-V1' .. again, to ignore output length warning - see above then the same input filename

'-t wav' .. forces output type, as above

'-' .. for piped output, see above

'trim 1.8' .. because this middle section will hit the middle of the transition at 2 seconds, so (with a 0.4 second crossfade) the ducked audio file will start 0.2 seconds before that

'fade t 0.4 3.4 0.4' .. to fade in the ducked section & fade back out again. So a 0.4 fade in. Then (the most complicated part) as the next crossfade will end at 5.2 seconds we must take that figure minus trimmed amount for this section, so 5.2-1.8=3.4 (again this is because fadeout position deals with the end timing of the fadeout)

'gain -6' .. is the amount, in dB, by which we should duck

'pad 1.8' .. must match the trim figure above, so that amount of silence is inserted at the start to make it synch when sections are mixed

'-t wav' .. to advise type of next part - as above

Then.. the THIRD piped part (return to full level)

'-V1' .. again - see above

then the same input filename

-t wav' .. to force output type, as above

-' .. for piped output, see above

trim 4.8' .. this final section will start at 5 seconds, but (with a 0.4 second crossfade) the audio will start 0.2 seconds before that

'fade t 0.4 0 0' .. just fade in to this full volume section. No fade out

'pad 4.8' .. must match the trim figure above, as explained above then output filename

'gain 9.542' .. looks tricky, but basically when you "-m" to mix 3 files the volume is reduced to 1/3 (one third) by SoX to give headroom.

Rather than defeating that, we boost to 300%. We get the dB amount of 9.542 with this formula 20*log(3)/log(10)


If you copy & paste the single line somewhere you can see it all easily, it's a lot less scary than the explanation!

Final though - I was initially concerned about whether the crossfades needed to be logarithmic rather than linear, but in my case from listening to the results linear has definitely given the sound I expected.

You may like to try longer crossfades, or have the point of transition happening earlier or later but I hope that single line gives hope to anyone who thought many temporary files would be required!

Let me know if more clarification would help!

audacity waveform

like image 117
dingles Avatar answered Sep 18 '22 03:09

dingles