Split speech audio file on words in python

Tags:

I feel like this is a fairly common problem but I haven't yet found a suitable answer. I have many audio files of human speech that I would like to break on words, which can be done heuristically by looking at pauses in the waveform, but can anyone point me to a function/library in python that does this automatically?

517

asked Apr 06 '16 17:04

user3059201

1 Answers

An easier way to do this is using pydub module. recent addition of silent utilities does all the heavy lifting such as setting up silence threahold , setting up silence length. etc and simplifies code significantly as opposed to other methods mentioned.

Here is an demo implementation , inspiration from here

Setup:

I had a audio file with spoken english letters from A to Z in the file "a-z.wav". A sub-directory splitAudio was created in the current working directory. Upon executing the demo code, the files were split onto 26 separate files with each audio file storing each syllable.

Observations: Some of the syllables were cut off, possibly needing modification of following parameters,
min_silence_len=500
silence_thresh=-16

One may want to tune these to one's own requirement.

Demo Code:

from pydub import AudioSegment from pydub.silence import split_on_silence  sound_file = AudioSegment.from_wav("a-z.wav") audio_chunks = split_on_silence(sound_file,      # must be silent for at least half a second     min_silence_len=500,      # consider it silent if quieter than -16 dBFS     silence_thresh=-16 )  for i, chunk in enumerate(audio_chunks):      out_file = ".//splitAudio//chunk{0}.wav".format(i)     print "exporting", out_file     chunk.export(out_file, format="wav")

Output:

Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> ================================ RESTART ================================ >>>  exporting .//splitAudio//chunk0.wav exporting .//splitAudio//chunk1.wav exporting .//splitAudio//chunk2.wav exporting .//splitAudio//chunk3.wav exporting .//splitAudio//chunk4.wav exporting .//splitAudio//chunk5.wav exporting .//splitAudio//chunk6.wav exporting .//splitAudio//chunk7.wav exporting .//splitAudio//chunk8.wav exporting .//splitAudio//chunk9.wav exporting .//splitAudio//chunk10.wav exporting .//splitAudio//chunk11.wav exporting .//splitAudio//chunk12.wav exporting .//splitAudio//chunk13.wav exporting .//splitAudio//chunk14.wav exporting .//splitAudio//chunk15.wav exporting .//splitAudio//chunk16.wav exporting .//splitAudio//chunk17.wav exporting .//splitAudio//chunk18.wav exporting .//splitAudio//chunk19.wav exporting .//splitAudio//chunk20.wav exporting .//splitAudio//chunk21.wav exporting .//splitAudio//chunk22.wav exporting .//splitAudio//chunk23.wav exporting .//splitAudio//chunk24.wav exporting .//splitAudio//chunk25.wav exporting .//splitAudio//chunk26.wav >>>

127

answered Sep 25 '22 01:09

Anil_M

Related questions
                            
                                How do I parse a yaml string with python?
                            
                                pandas pd.options.display.max_rows not working as expected
                            
                                C++ GDB Python Pretty Printing Tutorial?
                            
                                getting the opposite diagonal of a numpy array
                            
                                How to convert a string to an image?
                            
                                Python multiple repeat Error
                            
                                Finding the Values of the Arrow Keys in Python: Why are they triples?
                            
                                Why does numpy.linalg.solve() offer more precise matrix inversions than numpy.linalg.inv()?
                            
                                Using Boolean Flags in Python Click Library (command line arguments)
                            
                                Turtle module - Saving an image
                            
                                In Python argparse, is it possible to have paired --no-something/--something arguments?
                            
                                Why does right-clicking create an orange dot in the center of the circle?
                            
                                Celery - How to send task from remote machine?
                            
                                Django populate() isn't reentrant
                            
                                Installing iPython: "ImportError cannot import name path"?
                            
                                How To Plot Multiple Histograms On Same Plot With Seaborn
                            
                                "System error: new style getargs format but argument is not a tuple" when using cv2.blur
                            
                                Numpy: change max in each row to 1, all other numbers to 0
                            
                                pandas join DataFrame force suffix?
                            
                                Profiling a python program with PyCharm (or any other IDE)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Split speech audio file on words in python

Tags:

python

heuristics

audio

speech-recognition

speech

user3059201

People also ask

1 Answers

Anil_M

Recent Activity

Donate For Us