Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Explanation of audio stat using sox

Tags:

audio

sox

I have a bunch of audio files and need to split each files based on silence and using SOX. However, I realize that some files have very noisy background and some don't thus I can't use a single set of parameter to iterate over all files doing the split. I try to figure out how to separate them by noisy background. Here is what I got from sox input1.flac -n stat and sox input2.flac -n stat

Samples read:          18207744
Length (seconds):    568.992000
Scaled by:         2147483647.0
Maximum amplitude:     0.999969
Minimum amplitude:    -1.000000
Midline amplitude:    -0.000015
Mean    norm:          0.031888
Mean    amplitude:    -0.000361
RMS     amplitude:     0.053763
Maximum delta:         0.858917
Minimum delta:         0.000000
Mean    delta:         0.018609
RMS     delta:         0.039249
Rough   frequency:         1859
Volume adjustment:        1.000

and

Samples read:         198976896
Length (seconds):   6218.028000
Scaled by:         2147483647.0
Maximum amplitude:     0.999969
Minimum amplitude:    -1.000000
Midline amplitude:    -0.000015
Mean    norm:          0.156168
Mean    amplitude:    -0.000010
RMS     amplitude:     0.211787
Maximum delta:         1.999969
Minimum delta:         0.000000
Mean    delta:         0.091605
RMS     delta:         0.123462
Rough   frequency:         1484
Volume adjustment:        1.000

The former does not contain noisy background and the latter does. I suspect I can use the Sample Mean of Max delta because of the big gap. Can anyone explain for me the meaning of those stats, or at least show me where I can get it myself (I tried looking up in official documentation but they don't explain). Many thanks.

like image 534
Nguyễn Tài Long Avatar asked Apr 14 '17 16:04

Nguyễn Tài Long


People also ask

How do I use SoX audio?

Audio recorder SoX includes a very handy way of recording audio using the rec command. The simplest use is to type rec filename which will start recording from the default input until you stop it by pressing ctrl-c in the terminal window.

What does SoX command do?

SoX is a cross-platform (Windows, Linux, MacOS X, etc.) command line utility that can convert various formats of computer audio files in to other formats. It can also apply various effects to these sound files, and, as an added bonus, SoX can play and record audio files on most platforms.

What is SoX effect?

SoX reads and writes audio files in most popular formats and can optionally apply effects to them. It can combine multiple input sources, synthesise audio, and, on many systems, act as a general purpose audio player or a multi-track audio recorder.

How do I know what version of SoX I have?

A three-second peak-held value of headroom in dBs will be shown to the right of the meter if this is below 6dB. This option is enabled by default when using SoX to play or record audio. Show SoX's version number and exit. No messages are shown at all; use the exit status to determine if an error has occurred.


1 Answers

I don't know how I've managed to miss stat in the SoX docs all this time, it's right there.

  • Length
    • length of the audio file in seconds
  • Scaled by
    • what the input is scaled by. By default 2^31-1, to go from 32-bit signed integer to [-1, 1]
  • Maximum amplitude
    • maximum sample value
  • Minimum amplitude
    • minimum sample value
  • Midline amplitude
    • aka mid-range, midpoint between the max and minimum values.
  • Mean norm
    • arithmetic mean of samples' absolute values
  • Mean amplitude
    • arithmetic mean of samples' values
  • RMS amplitude
    • root mean square, root of squared values' mean
  • Maximum delta
    • maximum difference between two successive samples
  • Minimum delta
    • minimum difference between two successive samples
  • Mean delta
    • arithmetic mean of differences between successive samples
  • RMS delta
    • root mean square of differences between successive samples
  • Rough frequency
    • estimation of the input file's frequency, in hertz. unsure of method used
  • Volume adjustment
    • value that should be sent to -v so peak absolute amplitude is 1

Personally I'd rather use the stats function, whose output I find much more practically useful.

As a measure to differentiate between the more or less noisy audio I'd try using the difference between the highest and lowest sound levels. The quietest parts will never be quieter than the background noise alone, so if there is little difference the audio is either noisy, or just loud all the time, like a compressed pop song. You could take the difference between the maximum and minimum RMS values, or between peak and minimum RMS. The RMS window length should be kept fairly short, say between 10 and 200ms, and if the audio has fade-in or fade-out sections, those should be trimmed away, though I didn't include that in the code.

audio="input1.flac"
width=0.01

# Mixes down multi-channel files to mono
stats=$(sox "$audio" -n channels 1 stats -w $width 2>&1 |\
  grep "Pk lev dB\|RMS Pk dB\|RMS Tr dB" |\
  sed 's/[^0-9.-]*//g')

peak=$(head -n 1 <<< "$stats")
rmsmax=$(head -n 2 <<< "$stats" | tail -n 1)
rmsmin=$(tail -n 1 <<< "$stats")

rmsdif=$(bc <<< "scale=3; $rmsmax - $rmsmin")
pkmindif=$(bc <<< "scale=3; $peak - $rmsmin")

echo "
  max RMS: $rmsmax
  min RMS: $rmsmin

  diff RMS: $rmsdif
  peak-min: $pkmindif
"
like image 172
AkselA Avatar answered Nov 09 '22 19:11

AkselA