I am trying to output the begin-timestamps of periods of silence (since there is background noise, by silence I mean a threshold) in a given audio file. Eventually, I want to split the audio file into smaller audio files, given these timestamps. It is important that no part of the original file be discarded.
I tried
sox in.wav out.wav silence 1 0.5 1% 1 2.0 1% : newfile : restart
(courtesy http://digitalcardboard.com/blog/2009/08/25/the-sox-of-silence/)
Although, it somewhat did the job, it also trimmed and discarded the periods of silence, which I do not want happening.
Is 'silence' the right option, or is there a simpler way to accomplish what I need to do?
Thanks.
Unfortunately not Sox, but ffmpeg has a silencedetect
filter that does exactly what you're looking for:
ffmpeg -i in.wav -af silencedetect=noise=-50dB:d=1 -f null -
(detecting threshold of -50db, for a minimum of 1 seconds, cribbed from the ffmpeg documentation)
...this would print a result like this:
Press [q] to stop, [?] for help
[silencedetect @ 0x7ff2ba5168a0] silence_start: 264.718
[silencedetect @ 0x7ff2ba5168a0] silence_end: 265.744 | silence_duration: 1.02612
size=N/A time=00:04:29.53 bitrate=N/A
necroposting:
You can run a separate script that iterates all of the sox output files, (for f in *.wav), and use the command; soxi -D $f
to obtain the DURATION of the sound clip.
Then, get the system time in seconds date "+%s"
, then subtract to find the time the recording starts.
There is (currently, at least) no way to make the silence
effect output the position where it has detected silence, or to retain all of the silent audio.
If you are able to recompile SoX yourself, you could add an output statement yourself to find out about the cut positions, then use trim
in a separate invocation to split the file. With the stock version, you are out of luck.
SoX can easily give you the timestamps of the actual silences in a text file. Not periods of silence though, but you can calculate those with a simple script
.dat Text Data files. These files contain a textual representation of the sample data. There is one line at the beginning that contains the sample
rate, and one line that contains the number of channels. Subsequent lines contain two or more numeric data intems: the time since the beginning of
the first sample and the sample value for each channel.
Values are normalized so that the maximum and minimum are 1 and -1. This file format can be used to create data files for external programs such as
FFT analysers or graph routines. SoX can also convert a file in this format back into one of the other file formats.
Example containing only 2 stereo samples of silence:
; Sample Rate 8012
; Channels 2
0 0 0
0.00012481278 0 0
So you can do sox in.wav out.dat
, then parse the text file and consider a silence a sequence of rows with a value close to 0 (depending on your threshold)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With