Writing speech-recognition engine [closed]

Tags:

So, like many others I decided to create my own speech-recognition engine. As it turned out, it's not easy at all, instead, it's rather difficult to accomplish for English language particularly, because there is, I'd say, dramatical difference between the way a word is written, and the way it's pronounced. Being from Georgia, I decided to write speech-recognition for Georgian language. In Georgian, you pronounce words EXACTLY the way you write them. It's just like a transcription. Will this fact significantly ease my task? Or there are even more difficult... difficulties :D ?

755

asked Nov 20 '11 15:11

nicks

2 Answers

Speech recognition is a complex domain with many specific algorithms, tools and methods. To create your own engine you could start with CMUSphinx open source speech recognition toolkit which will allow you to:

Collect and process data required to support Georgian language
Create the models for Georgian
Implement a speech recognition engine in Georgian.
Use engine to create a speech recognition application running on desktop, on server or on IPhone (through OpenEars)

CMUSphinx already supports English, German, Spanish, French, Dutch, Russian, Mandarin, Icelandic, Italian and many other languages. It's very simple to add a new one. For new people it usually takes a month or two of concentrated work to implement the required process.

To get started visit the homepage:

http://cmusphinx.sourceforge.net

and read the tutorial

http://cmusphinx.sourceforge.net/wiki/tutorial

If you have any question, please ask them on forums or here!

And, it's a very common misconception that you just spell the sounds when you speak Georgian. It's not true for most of the languages in the world. To test the hypothesis try to record some audio in an audio editor and check which sounds are actually pronounced. You'll be surprised. Tutorial above covers this question in details.

answered Sep 20 '22 00:09

Nikolay Shmyrev

Do all people from Georgia sound absolutely the same ? I think not... lots of major problems in speech recognition are not directly related to the language itself:

different people (women, men, children, elders etc.) have different voices
sometimes the same person sounds different for example when the person has a cold
different background noises
everyday speech sometimes contains words from other languages (like you have the german word Kindergarden in the US/English)
some persons not from the country itself learned the language (they usually sound different)
some persons speak faster, others speak slower
quality of the microphone
etc.

Solving these things always is pretty hard... on top of that you have the language/pronounciation to take care of... I don't know Georgian but what you describe might make the task a bit easier but it will still be a hard task.

EDIT - as per comments:

Using good libraries might lower the time-frame and even help in quality... but not every library is good for speech recognition despite perhaps being brilliant on some other audio-related matters...

For reference see the Wikipedia article http://en.wikipedia.org/wiki/Speech_recognition - it has a good overview including some links and book references which are a good starting point...

As for how to design such an API see for example http://java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/Recognition.html

answered Sep 23 '22 00:09

Yahia

Related questions
                            
                                C# Speech Recognition
                            
                                how to build BufferReceived() to capture voice using RecognizerIntent?
                            
                                C# Speech Recognition
                            
                                Speech processing library in Python for speech to text
                            
                                Speech recognition in Windows Phone 8
                            
                                How to hide toast message “Your audio will be sent to google to provide speech recognition service.” in Android?
                            
                                How can i detect one word with speech recognition in Python
                            
                                Where is `Google Speech API Key`?
                            
                                Cannot find microphone "allow" button in Opera browser
                            
                                Microsoft Speech Recognition Platform
                            
                                Microsoft Speech Recognition Speed
                            
                                Offline google voice recognition on android
                            
                                SpeechRecognizer offline ERROR_NO_MATCH
                            
                                x-webkit-speech, how to auto click the mic icon to record many words automatically?
                            
                                what text to speech and speech recognition libraries are available for Clojure?
                            
                                Building Speech Dataset for LSTM binary classification
                            
                                How to find difference of two voice files using python
                            
                                Get user input from Speech?
                            
                                Customize the Speech Recognition Dialog
                            
                                iOS 10.0 Speech Recognition Error kAFAssistantErrorDomain

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Writing speech-recognition engine [closed]

Tags:

speech-recognition

georgian

nicks

People also ask

2 Answers

Nikolay Shmyrev

Yahia

Recent Activity

Donate For Us