I need to automatically split video of a speech by words, so every word is a separate video file. Do you know any ways to do this?
My plan was to detect silent parts and use them as words separators. But i didn't find any tool to do this and looks like ffmpeg is not the right tool for that.
Right-click the video clip and select Split Audio. This will separate the audio clip from the video track and put it on a track of its own. To remove the audio, simply delete this track.
Whether you need to extract an audio file from a single clip or a whole video, Adobe Premiere Pro has the tools to create audio clips from any video file including MP4, AVI, FLV, and MPEG. And using your video editor as an audio converter can streamline your workflow, as well.
To open the Detect Silence dialog, select one or several audio events in the Project window or the Audio Part Editor and select Audio > Advanced > Detect Silence.
This is a python code snippet that I use for splitting files as per necessity. I use the pydub library from https://github.com/jiaaro/pydub. You can modify the snippet to suit your requirement. from pydub import AudioSegment t1 = t1 * 1000 #Works in milliseconds t2 = t2 * 1000 newAudio = AudioSegment.
You could first use ffmpeg to detect intervals of silence, like this
ffmpeg -i "input.mov" -af silencedetect=noise=-30dB:d=0.5 -f null - 2> vol.txt
This will produce console output with readings that look like this:
[silencedetect @ 00000000004b02c0] silence_start: -0.0306667 [silencedetect @ 00000000004b02c0] silence_end: 1.42767 | silence_duration: 1.45833 [silencedetect @ 00000000004b02c0] silence_start: 2.21583 [silencedetect @ 00000000004b02c0] silence_end: 2.7585 | silence_duration: 0.542667 [silencedetect @ 00000000004b02c0] silence_start: 3.1315 [silencedetect @ 00000000004b02c0] silence_end: 5.21833 | silence_duration: 2.08683 [silencedetect @ 00000000004b02c0] silence_start: 5.3895 [silencedetect @ 00000000004b02c0] silence_end: 7.84883 | silence_duration: 2.45933 [silencedetect @ 00000000004b02c0] silence_start: 8.05117 [silencedetect @ 00000000004b02c0] silence_end: 10.0953 | silence_duration: 2.04417 [silencedetect @ 00000000004b02c0] silence_start: 10.4798 [silencedetect @ 00000000004b02c0] silence_end: 12.4387 | silence_duration: 1.95883 [silencedetect @ 00000000004b02c0] silence_start: 12.6837 [silencedetect @ 00000000004b02c0] silence_end: 14.5572 | silence_duration: 1.8735 [silencedetect @ 00000000004b02c0] silence_start: 14.9843 [silencedetect @ 00000000004b02c0] silence_end: 16.5165 | silence_duration: 1.53217
You then generate commands to split from each silence end to the next silence start. You will probably want to add some handles of, say, 250 ms, so the audio will have a duration of 250 ms * 2 more.
ffmpeg -ss <silence_end - 0.25> -t <next_silence_start - silence_end + 2 * 0.25> -i input.mov word-N.mov
(I have skipped specifying audio/video parameters)
You'll want to write a script to scrape the console log and generate a structured (maybe CSV) file with the timecodes - one pair on each line: silence_end and the next silence_start. And then another script to generate the commands with each pair of numbers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With