Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split video or audio by silent parts

I need to automatically split video of a speech by words, so every word is a separate video file. Do you know any ways to do this?

My plan was to detect silent parts and use them as words separators. But i didn't find any tool to do this and looks like ffmpeg is not the right tool for that.

like image 751
TermiT Avatar asked Mar 18 '16 00:03

TermiT


People also ask

How to split a video and audio?

Right-click the video clip and select Split Audio. This will separate the audio clip from the video track and put it on a track of its own. To remove the audio, simply delete this track.

Is possible to split audio from video?

Whether you need to extract an audio file from a single clip or a whole video, Adobe Premiere Pro has the tools to create audio clips from any video file including MP4, AVI, FLV, and MPEG. And using your video editor as an audio converter can streamline your workflow, as well.

How can I detect silence in audio files?

To open the Detect Silence dialog, select one or several audio events in the Project window or the Audio Part Editor and select Audio > Advanced > Detect Silence.

How do you split audio in Python?

This is a python code snippet that I use for splitting files as per necessity. I use the pydub library from https://github.com/jiaaro/pydub. You can modify the snippet to suit your requirement. from pydub import AudioSegment t1 = t1 * 1000 #Works in milliseconds t2 = t2 * 1000 newAudio = AudioSegment.


1 Answers

You could first use ffmpeg to detect intervals of silence, like this

ffmpeg -i "input.mov" -af silencedetect=noise=-30dB:d=0.5 -f null - 2> vol.txt 

This will produce console output with readings that look like this:

[silencedetect @ 00000000004b02c0] silence_start: -0.0306667 [silencedetect @ 00000000004b02c0] silence_end: 1.42767 | silence_duration: 1.45833 [silencedetect @ 00000000004b02c0] silence_start: 2.21583 [silencedetect @ 00000000004b02c0] silence_end: 2.7585 | silence_duration: 0.542667 [silencedetect @ 00000000004b02c0] silence_start: 3.1315 [silencedetect @ 00000000004b02c0] silence_end: 5.21833 | silence_duration: 2.08683 [silencedetect @ 00000000004b02c0] silence_start: 5.3895 [silencedetect @ 00000000004b02c0] silence_end: 7.84883 | silence_duration: 2.45933 [silencedetect @ 00000000004b02c0] silence_start: 8.05117 [silencedetect @ 00000000004b02c0] silence_end: 10.0953 | silence_duration: 2.04417 [silencedetect @ 00000000004b02c0] silence_start: 10.4798 [silencedetect @ 00000000004b02c0] silence_end: 12.4387 | silence_duration: 1.95883 [silencedetect @ 00000000004b02c0] silence_start: 12.6837 [silencedetect @ 00000000004b02c0] silence_end: 14.5572 | silence_duration: 1.8735 [silencedetect @ 00000000004b02c0] silence_start: 14.9843 [silencedetect @ 00000000004b02c0] silence_end: 16.5165 | silence_duration: 1.53217 

You then generate commands to split from each silence end to the next silence start. You will probably want to add some handles of, say, 250 ms, so the audio will have a duration of 250 ms * 2 more.

ffmpeg -ss <silence_end - 0.25> -t <next_silence_start - silence_end + 2 * 0.25> -i input.mov word-N.mov 

(I have skipped specifying audio/video parameters)

You'll want to write a script to scrape the console log and generate a structured (maybe CSV) file with the timecodes - one pair on each line: silence_end and the next silence_start. And then another script to generate the commands with each pair of numbers.

like image 96
Gyan Avatar answered Sep 24 '22 10:09

Gyan