Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Text-to-speech (voice generation) and speech-to-text (voice recognition) APIs?

Is there a comprehensive list of known APIs for desktop or browser environments?

like image 313
Vladimir Keleshev Avatar asked Jun 14 '11 19:06

Vladimir Keleshev


People also ask

What is speech recognition API?

The speech recognition part of the Web Speech API allows authorized Web applications to access the device's microphone and produces a transcript of the voice being recorded. This allows Web applications to use voice as one of the input & control method, similar to touch or keyboard.

Is Google Text-to-Speech API free?

Text-to-Speech is priced based on the number of characters sent to the service to be synthesized into audio each month. You must enable billing to use Text-to-Speech, and will be automatically charged if your usage exceeds the number of free characters allowed per month.


1 Answers

I'll rehash and update an answer from Speech recognition in C or Java or PHP?. This is by no means comprehensive, but it might be a start for you


From watching these questions for few months, I've seen most developer choices break down like this:

Windows folks - use the System.Speech features of .Net or Microsoft.Speech and install the free recognizers Microsoft provides. Windows 7 includes a full speech engine. Others are downloadable for free. There is a C++ API to the same engines known as SAPI. See at http://msdn.microsoft.com/en-us/magazine/cc163663.aspx. or http://msdn.microsoft.com/en-us/library/ms723627(v=vs.85).aspx. More background on Microsoft engines for Windows What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?

Linux folks - Sphinx seems to have a good following. See http://cmusphinx.sourceforge.net/ and http://cmusphinx.sourceforge.net/wiki/

Commercial products - Nuance, Loquendo, AT&T, IBM, others. Each provide their own SDKs and libraries for various languages.

Online service - Nuance, Yapme, ispeech.org, vlingo, others. Nuance has improved their developer program and will now give you free access to their services for development. Yap (I believe) was recently purchased by Amazon, so we may see some changes there.

Of course this may also be helpful - http://en.wikipedia.org/wiki/List_of_speech_recognition_software

There is a Java speech API. See javax.speech.recognition in the Java Speech API http://java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/Recognition.html. I believe you still have to find a speech engine that supports this API. I don't think Sphinx fully supports it - http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4-faq.html#support_jsapi

There are lots of other SO quesitons: Need text to speech and speech recognition tools for Linux and pyspeech (python) - Transcribe mp3 files? which talks about http://code.google.com/p/pyspeech/. You may also want to look at http://code.google.com/p/dragonfly/

like image 148
Michael Levy Avatar answered Oct 21 '22 09:10

Michael Levy