I've used the voice recognition feature on Android and I love it. It's one of my customers' most praised features. However, the format is somewhat restrictive. You have to call the recognizer intent, have it send the recording for transcription to google, and wait for the text back.
Some of my ideas would require recording the audio within my app and then sending the clip to google for transcription.
Is there any way I can send an audio clip to be processed with speech to text?
Gboard (Android, iOS) Google's excellent Gboard app, which includes dictation, works with both Android and iOS. To use it, go anywhere you can type (email, browser, text, document), and the keyboard will pop up. Tap the microphone icon at the top right of the keyboard, and start speaking when prompted.
I got a solution that is working well to have speech recognizing and audio recording. Here is the link to a simple Android project I created to show the solution's working. Also, I put some print screens inside the project to illustrate the app.
I'm gonna try to explain briefly the approach I used. I combined two features in that project: Google Speech API and Flac recording.
Google Speech API is called through HTTP connections. Mike Pultz gives more details about the API:
"(...) the new [Google] API is a full-duplex streaming API. What this means, is that it actually uses two HTTP connections- one POST request to upload the content as a “live” chunked stream, and a second GET request to access the results, which makes much more sense for longer audio samples, or for streaming audio."
However, this API needs to receive a FLAC sound file to work properly. That makes us to go to the second part: Flac recording
I implemented Flac recording in that project through extracting and adapting some pieces of code and libraries from an open source app called AudioBoo. AudioBoo uses native code to record and play flac format.
Thus, it's possible to record a flac sound, send it to Google Speech API, get the text, and play the sound that was just recorded.
The project I created has the basic principles to make it work and can be improved for specific situations. In order to make it work in a different scenario, it's necessary to get a Google Speech API key, which is obtained by being part of Google Chromium-dev group. I left one key in that project just to show it's working, but I'll remove it eventually. If someone needs more information about it, let me know cause I'm not able to put more than 2 links in this post.
Unfortunately not at this time. The only interface currently supported by Android's voice recognition service is the RecognizerIntent, which doesn't allow you to provide your own sound data.
If this is something you'd like to see, file a feature request at http://b.android.com. This is also tangentially related to existing issue 4541.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With