I have a bunch of audio files and need to split each files based on silence and using SOX. However, I realize that some files have very noisy background and some don't thus I can't use a single set of parameter to iterate over all files doing the split. I try to figure out how to separate them by noisy background. Here is what I got from sox input1.flac -n stat
and sox input2.flac -n stat
Samples read: 18207744
Length (seconds): 568.992000
Scaled by: 2147483647.0
Maximum amplitude: 0.999969
Minimum amplitude: -1.000000
Midline amplitude: -0.000015
Mean norm: 0.031888
Mean amplitude: -0.000361
RMS amplitude: 0.053763
Maximum delta: 0.858917
Minimum delta: 0.000000
Mean delta: 0.018609
RMS delta: 0.039249
Rough frequency: 1859
Volume adjustment: 1.000
and
Samples read: 198976896
Length (seconds): 6218.028000
Scaled by: 2147483647.0
Maximum amplitude: 0.999969
Minimum amplitude: -1.000000
Midline amplitude: -0.000015
Mean norm: 0.156168
Mean amplitude: -0.000010
RMS amplitude: 0.211787
Maximum delta: 1.999969
Minimum delta: 0.000000
Mean delta: 0.091605
RMS delta: 0.123462
Rough frequency: 1484
Volume adjustment: 1.000
The former does not contain noisy background and the latter does. I suspect I can use the Sample Mean
of Max delta
because of the big gap.
Can anyone explain for me the meaning of those stats, or at least show me where I can get it myself (I tried looking up in official documentation but they don't explain). Many thanks.
Audio recorder SoX includes a very handy way of recording audio using the rec command. The simplest use is to type rec filename which will start recording from the default input until you stop it by pressing ctrl-c in the terminal window.
SoX is a cross-platform (Windows, Linux, MacOS X, etc.) command line utility that can convert various formats of computer audio files in to other formats. It can also apply various effects to these sound files, and, as an added bonus, SoX can play and record audio files on most platforms.
SoX reads and writes audio files in most popular formats and can optionally apply effects to them. It can combine multiple input sources, synthesise audio, and, on many systems, act as a general purpose audio player or a multi-track audio recorder.
A three-second peak-held value of headroom in dBs will be shown to the right of the meter if this is below 6dB. This option is enabled by default when using SoX to play or record audio. Show SoX's version number and exit. No messages are shown at all; use the exit status to determine if an error has occurred.
I don't know how I've managed to miss stat in the SoX docs all this time, it's right there.
Personally I'd rather use the stats
function, whose output I find much more practically useful.
As a measure to differentiate between the more or less noisy audio I'd try using the difference between the highest and lowest sound levels. The quietest parts will never be quieter than the background noise alone, so if there is little difference the audio is either noisy, or just loud all the time, like a compressed pop song. You could take the difference between the maximum and minimum RMS values, or between peak and minimum RMS. The RMS window length should be kept fairly short, say between 10 and 200ms, and if the audio has fade-in or fade-out sections, those should be trimmed away, though I didn't include that in the code.
audio="input1.flac"
width=0.01
# Mixes down multi-channel files to mono
stats=$(sox "$audio" -n channels 1 stats -w $width 2>&1 |\
grep "Pk lev dB\|RMS Pk dB\|RMS Tr dB" |\
sed 's/[^0-9.-]*//g')
peak=$(head -n 1 <<< "$stats")
rmsmax=$(head -n 2 <<< "$stats" | tail -n 1)
rmsmin=$(tail -n 1 <<< "$stats")
rmsdif=$(bc <<< "scale=3; $rmsmax - $rmsmin")
pkmindif=$(bc <<< "scale=3; $peak - $rmsmin")
echo "
max RMS: $rmsmax
min RMS: $rmsmin
diff RMS: $rmsdif
peak-min: $pkmindif
"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With