I would like to synchronize a spoken recording against a known text. Is there a speech-to-text / natural language processing library that would facilitate this? I imagine I'd want to detect word boundaries and compute candidate matches from a dictionary. Most of the questions I've found on SO concern written language.
Desired, but not required:
Edit: I realize this is a very broad, even naive, question, so thanks in advance for your guidance.
What I've found so far:
Translation of Speech to Text: First, we need to import the library and then initialize it using init() function. This function may take 2 arguments. After initialization, we will make the program speak the text using say() function. This method may also take 2 arguments.
The easiest way to install this is using pip install SpeechRecognition. Otherwise, download the source distribution from PyPI, and extract the archive. In the folder, run python setup.py install.
What is a Speech-to-Text API? At its core, a speech-to-text application programming interface (API) is simply the ability to call a service to transcribe audio into speech.
Forced Alignment
It sounds like you want to do forced alignment between your audio and the known text.
Pretty much all research/industry grade speech recognition systems will be able to do this, since forced alignment is an important part of training a recognition system on data that doesn't have phone level alignments between the audio and the transcript.
Alignment CMUSphinx
The Sphinx4-1.0 beta 5 release of CMU's open source speech recognition system now includes a demo on how to do alignment between a transcript and long speech recordings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With