Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why isn't speech recognition advancing? [closed]

What's so difficult about the subject that algorithm designers are having a hard time tackling it?

Is it really that complex?

I'm having a hard time grasping why this topic is so problematic. Can anyone give me an example as to why this is the case?

like image 657
Yuval Adam Avatar asked Jul 09 '09 09:07

Yuval Adam


People also ask

What are the problems with speech recognition systems?

ASR systems are not accurately processing and understanding human speech due to background noise, multiple people talking, signal disruption, and distance.

What are the limitations of speech recognition?

There are limitations to speech recognition software. It does not always work across all operating systems. Noisy environments, accents and multiple speakers may degrade results. Also, regular voice recognition software can lack integration with other key services.

How do I stop the speech recognition from popping up?

Right-click on the Start button and select Settings. In the Settings menu, go to the Accessibility tab on the list on the left-hand side. In the right-pane, under the Interaction column, please select Speech. Here, you can turn OFF the switch associated with Windows Speech Recognition.

Is there any effective speech recognition system available?

Google Speech-to-Text API Key features: The key features of Google Speech-to-Text API include: High accuracy: It has an accuracy rate of 80-85%. Transcription capabilities: It can transcribe audio in 125+ languages and variants, including pre-recorded and real-time audio.


1 Answers

Auditory processing is a very complex task. Human evolution has produced a system so good that we don't realize how good it is. If three persons are talking to you at the same time you will be able to focus in one signal and discard the others, even if they are louder. Noise is very well discarded too. In fact, if you hear human voice played backwards, the first stages of the auditory system will send this signal to a different processing area than if it is real speech signal, because the system will regard it as "no-voice". This is an example of the outstanding abilities humans have.

Speech recognition advanced quickly from the 70s because researchers were studying the production of voice. This is a simpler system: vocal chords excited or not, resonation of vocal tractus... it is a mechanical system easy to understand. The main product of this approach is the cepstral analysis. This led automatic speech recognition (ASR) to achieve acceptable results. But this is a sub-optimal approach. Noise separation is quite bad, even when it works more or less in clean environments, it is not going to work with loud music in the background, not as humans will.

The optimal approach depends on the understanding of the auditory system. Its first stages in the cochlea, the inferior colliculus... but also the brain is involved. And we don't know so much about this. It is being a difficult change of paradigm.

Professor Hynek Hermansky compared in a paper the current state of the research with when humans wanted to fly. We didn't know what was the secret —The feathers? wings flapping?— until we discovered Bernoulli's force.

like image 142
nacmartin Avatar answered Sep 23 '22 15:09

nacmartin