I know its a general question topic, but still i want to know whats the fastest speech recognition library in C++?
Currently I am using Microsoft SAPI with kniect. It works fine and recognizes words but its abit slow, some times it takes 1,2 seconds to recognize a word and in my case this lag is causing alot of interaction issues for the user.
I checked the sample provided with the kinect, in which the turtle moves left right according to words recognized but even thats a bit slow.
So I was wondering if there is any faster library then sapi, that can be used in cases like a robot using voice recognition you say "left" then "right" but robot keeps moving left and turns right after 1,2 seconds its a bit frustrating for the user.
1. Project DeepSpeech 2. Kaldi 3. Julius 4. Wav2Letter++ 5. DeepSpeech2 6. OpenSeq2Seq 7. Fairseq 8. Vosk 9. Athena 10. ESPnet What is the Best Open Source Speech Recognition System? What is a Speech Recognition Library/System? They are the software engines responsible for transmitting voice into the actual texts.
The code is released under the BSD license. Facebook is describing its library as “the fastest state-of-the-art speech recognition system available”. The concepts on which this tool is built makes it optimized for performance by default; Facebook’s also-new machine learning library FlashLight is used as the underlying core of Wav2Letter++.
What is a Speech Recognition Library/System? They are the software engines responsible for transmitting voice into the actual texts. They are not meant to be used by end users, as developers will first have to adapt these libraries and use them in order to create a program that end users may use later.
The first speech recognition system named “Audrey” was created by Bell Laboratories in 1952 and could only recognize digits. IBM created the first word recognition system 10 years later in 1962. The system was named “Shoebox” and could understand 16 English words. Today speech recognition is used far more than people realize.
The issue is not being fast, but proper way to use the API. Speech recognition is a time-consuming process so the main trick is to start recognition of the audio as soon as it's recorded, in parallel with the recording. Then to the moment phrase end is spoken you will have almost all the results and can react immediately.
The response time of 0.2 seconds can be achieved this way, but you need more flexible API to implement this. A good choice is CMUSphinx, an open source speech recognition framework which you can use for your implementation
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With