Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Synchronizing text and audio. Is there a NLP/speech-to-text library to do this?

I would like to synchronize a spoken recording against a known text. Is there a speech-to-text / natural language processing library that would facilitate this? I imagine I'd want to detect word boundaries and compute candidate matches from a dictionary. Most of the questions I've found on SO concern written language.

Desired, but not required:

  • Open Source
  • Compatible with American English out-of-the-box
  • Cross-platform
  • Thoroughly documented

Edit: I realize this is a very broad, even naive, question, so thanks in advance for your guidance.

What I've found so far:

  • OpenEars (iOS Sphinx/Flite wrapper)
like image 831
Justin Avatar asked Nov 01 '10 18:11

Justin


People also ask

How do you create a speech to text in Python?

Translation of Speech to Text: First, we need to import the library and then initialize it using init() function. This function may take 2 arguments. After initialization, we will make the program speak the text using say() function. This method may also take 2 arguments.

How do I use speech recognition library in Python?

The easiest way to install this is using pip install SpeechRecognition. Otherwise, download the source distribution from PyPI, and extract the archive. In the folder, run python setup.py install.

What is a speech to text API?

What is a Speech-to-Text API? At its core, a speech-to-text application programming interface (API) is simply the ability to call a service to transcribe audio into speech.


1 Answers

Forced Alignment

It sounds like you want to do forced alignment between your audio and the known text.

Pretty much all research/industry grade speech recognition systems will be able to do this, since forced alignment is an important part of training a recognition system on data that doesn't have phone level alignments between the audio and the transcript.

Alignment CMUSphinx

The Sphinx4-1.0 beta 5 release of CMU's open source speech recognition system now includes a demo on how to do alignment between a transcript and long speech recordings.

like image 140
dmcer Avatar answered Sep 20 '22 05:09

dmcer