I have a bunch of audio files and need to split each files based on silence and using SOX. However, I realize that some files have very noisy background and some don't thus I can't use a single set of parameter to iterate over all files doing the split. I try to figure out how to separate them by noisy background. Here is what I got from <code>sox input1.flac -n stat</code> and <code>sox input2.flac -n stat</code> <pre class="prettyprint"><code>Samples read: 18207744 Length (seconds): 568.992000 Scaled by: 2147483647.0 Maximum amplitude: 0.999969 Minimum amplitude: -1.000000 Midline amplitude: -0.000015 Mean norm: 0.031888 Mean amplitude: -0.000361 RMS amplitude: 0.053763 Maximum delta: 0.858917 Minimum delta: 0.000000 Mean delta: 0.018609 RMS delta: 0.039249 Rough frequency: 1859 Volume adjustment: 1.000 </code></pre> and <pre class="prettyprint"><code>Samples read: 198976896 Length (seconds): 6218.028000 Scaled by: 2147483647.0 Maximum amplitude: 0.999969 Minimum amplitude: -1.000000 Midline amplitude: -0.000015 Mean norm: 0.156168 Mean amplitude: -0.000010 RMS amplitude: 0.211787 Maximum delta: 1.999969 Minimum delta: 0.000000 Mean delta: 0.091605 RMS delta: 0.123462 Rough frequency: 1484 Volume adjustment: 1.000 </code></pre> The former does not contain noisy background and the latter does. I suspect I can use the <code>Sample Mean</code> of <code>Max delta</code> because of the big gap. Can anyone explain for me the meaning of those stats, or at least show me where I can get it myself (I tried looking up in official documentation but they don't explain). Many thanks.

I don't know how I've managed to miss stat in the SoX docs all this time, it's right there. <ul> <li> Length <ul> <li>length of the audio file in seconds</li> </ul> </li> <li> Scaled by <ul> <li>what the input is scaled by. By default 2^31-1, to go from 32-bit signed integer to [-1, 1]</li> </ul> </li> <li> Maximum amplitude <ul> <li>maximum sample value</li> </ul> </li> <li> Minimum amplitude <ul> <li>minimum sample value</li> </ul> </li> <li> Midline amplitude <ul> <li>aka mid-range, midpoint between the max and minimum values. </li> </ul> </li> <li> Mean norm <ul> <li>arithmetic mean of samples' absolute values</li> </ul> </li> <li> Mean amplitude <ul> <li>arithmetic mean of samples' values</li> </ul> </li> <li> RMS amplitude <ul> <li> root mean square, root of squared values' mean</li> </ul> </li> <li> Maximum delta <ul> <li>maximum difference between two successive samples</li> </ul> </li> <li> Minimum delta <ul> <li>minimum difference between two successive samples</li> </ul> </li> <li> Mean delta <ul> <li>arithmetic mean of differences between successive samples </li> </ul> </li> <li> RMS delta <ul> <li>root mean square of differences between successive samples </li> </ul> </li> <li> Rough frequency <ul> <li>estimation of the input file's frequency, in hertz. unsure of method used</li> </ul> </li> <li> Volume adjustment <ul> <li>value that should be sent to -v so peak absolute amplitude is 1</li> </ul> </li> </ul> Personally I'd rather use the <code>stats</code> function, whose output I find much more practically useful. As a measure to differentiate between the more or less noisy audio I'd try using the difference between the highest and lowest sound levels. The quietest parts will never be quieter than the background noise alone, so if there is little difference the audio is either noisy, or just loud all the time, like a compressed pop song. You could take the difference between the maximum and minimum RMS values, or between peak and minimum RMS. The RMS window length should be kept fairly short, say between 10 and 200ms, and if the audio has fade-in or fade-out sections, those should be trimmed away, though I didn't include that in the code. <pre class="prettyprint"><code>audio="input1.flac" width=0.01 # Mixes down multi-channel files to mono stats=$(sox "$audio" -n channels 1 stats -w $width 2>&1 |\ grep "Pk lev dB\|RMS Pk dB\|RMS Tr dB" |\ sed 's/[^0-9.-]*//g') peak=$(head -n 1 <<< "$stats") rmsmax=$(head -n 2 <<< "$stats" | tail -n 1) rmsmin=$(tail -n 1 <<< "$stats") rmsdif=$(bc <<< "scale=3; $rmsmax - $rmsmin") pkmindif=$(bc <<< "scale=3; $peak - $rmsmin") echo " max RMS: $rmsmax min RMS: $rmsmin diff RMS: $rmsdif peak-min: $pkmindif " </code></pre>

Explanation of audio stat using sox

Tags:

audio

sox

I have a bunch of audio files and need to split each files based on silence and using SOX. However, I realize that some files have very noisy background and some don't thus I can't use a single set of parameter to iterate over all files doing the split. I try to figure out how to separate them by noisy background. Here is what I got from sox input1.flac -n stat and sox input2.flac -n stat

Samples read:          18207744
Length (seconds):    568.992000
Scaled by:         2147483647.0
Maximum amplitude:     0.999969
Minimum amplitude:    -1.000000
Midline amplitude:    -0.000015
Mean    norm:          0.031888
Mean    amplitude:    -0.000361
RMS     amplitude:     0.053763
Maximum delta:         0.858917
Minimum delta:         0.000000
Mean    delta:         0.018609
RMS     delta:         0.039249
Rough   frequency:         1859
Volume adjustment:        1.000

and

Samples read:         198976896
Length (seconds):   6218.028000
Scaled by:         2147483647.0
Maximum amplitude:     0.999969
Minimum amplitude:    -1.000000
Midline amplitude:    -0.000015
Mean    norm:          0.156168
Mean    amplitude:    -0.000010
RMS     amplitude:     0.211787
Maximum delta:         1.999969
Minimum delta:         0.000000
Mean    delta:         0.091605
RMS     delta:         0.123462
Rough   frequency:         1484
Volume adjustment:        1.000

The former does not contain noisy background and the latter does. I suspect I can use the Sample Mean of Max delta because of the big gap. Can anyone explain for me the meaning of those stats, or at least show me where I can get it myself (I tried looking up in official documentation but they don't explain). Many thanks.

534

asked Apr 14 '17 16:04

Nguyễn Tài Long

1 Answers

I don't know how I've managed to miss stat in the SoX docs all this time, it's right there.

Length
- length of the audio file in seconds
Scaled by
- what the input is scaled by. By default 2^31-1, to go from 32-bit signed integer to [-1, 1]
Maximum amplitude
- maximum sample value
Minimum amplitude
- minimum sample value
Midline amplitude
- aka mid-range, midpoint between the max and minimum values.
Mean norm
- arithmetic mean of samples' absolute values
Mean amplitude
- arithmetic mean of samples' values
RMS amplitude
- root mean square, root of squared values' mean
Maximum delta
- maximum difference between two successive samples
Minimum delta
- minimum difference between two successive samples
Mean delta
- arithmetic mean of differences between successive samples
RMS delta
- root mean square of differences between successive samples
Rough frequency
- estimation of the input file's frequency, in hertz. unsure of method used
Volume adjustment
- value that should be sent to -v so peak absolute amplitude is 1

Personally I'd rather use the stats function, whose output I find much more practically useful.

As a measure to differentiate between the more or less noisy audio I'd try using the difference between the highest and lowest sound levels. The quietest parts will never be quieter than the background noise alone, so if there is little difference the audio is either noisy, or just loud all the time, like a compressed pop song. You could take the difference between the maximum and minimum RMS values, or between peak and minimum RMS. The RMS window length should be kept fairly short, say between 10 and 200ms, and if the audio has fade-in or fade-out sections, those should be trimmed away, though I didn't include that in the code.

audio="input1.flac"
width=0.01

# Mixes down multi-channel files to mono
stats=$(sox "$audio" -n channels 1 stats -w $width 2>&1 |\
  grep "Pk lev dB\|RMS Pk dB\|RMS Tr dB" |\
  sed 's/[^0-9.-]*//g')

peak=$(head -n 1 <<< "$stats")
rmsmax=$(head -n 2 <<< "$stats" | tail -n 1)
rmsmin=$(tail -n 1 <<< "$stats")

rmsdif=$(bc <<< "scale=3; $rmsmax - $rmsmin")
pkmindif=$(bc <<< "scale=3; $peak - $rmsmin")

echo "
  max RMS: $rmsmax
  min RMS: $rmsmin

  diff RMS: $rmsdif
  peak-min: $pkmindif
"

172

answered Nov 09 '22 19:11

AkselA

Related questions
                            
                                ffmpeg stream offset command (-itsoffset) not working
                            
                                Android Audio - Streaming sine-tone generator odd behaviour
                            
                                AudioServicesPlaySystemSound Volume on iPad
                            
                                OPUS Audio codec encoding for iPhone
                            
                                Access all input channels of an audio interface using the WebAudio API?
                            
                                React Native Audio Visualization
                            
                                Speech recognition using a real time stream
                            
                                Where can I find a C++ csound tutorial? [closed]
                            
                                Continuous recording in PortAudio (from mic or output)
                            
                                FFmpeg - resampling from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16 got very bad sound quality (slow, out of tune, noise)
                            
                                Audio Array Buffer to Audio Element
                            
                                Programmatically changing system-wide speaker balance on Windows 7
                            
                                How to equalize stereo input and apply audio effect only to single channel on iOS?
                            
                                USB Audio Class 2.0 - How to support multiple bit rates/sample rates
                            
                                Convert Raw to Wav Streams in NodeJS
                            
                                How I can play streaming audio over Ethernet in Qt?
                            
                                Android MediaPlayer getCurrentPosition() causes audio stutter
                            
                                Analyse frequency of mp3 files with python
                            
                                Live audio via socket.io 1.0
                            
                                Google Chrome no longer plays certain audio files

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With