Android's SpeechRecognizer apparently doesn't allow to record the input on which you're doing speech recognition into an audio file. That is, either you record voice using a MediaRecorder (or AudioRecord for that matter) or you do Speech Recognition with a SpeechRecognizer, in which case the audio isn't recorded into a file (at least not one you can access); but you can't do both at the same time.
The question of how to achieve recording audio and doing speech recognition at the same time in Android has been asked several times, and the most popular "solution" is to record a flac file and use Google's unofficial Speech API which allows you to send a flac file via a POST request and obtain a json response with the transcription. http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/ (outdated Android version) https://github.com/katchsvartanian/voiceRecognition/tree/master/VoiceRecognition http://mikepultz.com/2013/07/google-speech-api-full-duplex-php-version/
That works pretty well but has a huge limitation which is it can't be used with files longer than about 10-15 seconds (the exact limit is not clear and may depend on file size or perhaps the amount of words). This makes it not suitable for my needs.
Also, slicing the audio file into smaller files is NOT a possible solution; even forgetting about the difficulties in properly splitting the file at the right positions (not in the middle of a word), many consecutive requests to the abovementioned web service api will randomly result in empty responses (Google says there's a usage limit of 50 requests per day, but as usual they don't disclose the details of the real usage limits which clearly restrict bursts of requests).
So, all this would seem to indicate that getting a transcription of speech while at the same time recording the input into an audio file in Android is IMPOSSIBLE.
HOWEVER, the Google Keep Android app does exactly that. It allows you to speak, transcrbes what you said into text, and saves both the text and the audio recording (well it's not clear where it stores it, but you can replay it). And it has no length limitation.
So the question is: DOES ANYBODY HAVE AN IDEA OF HOW GOOGLE KEEP DOES IT? I would look at the source code but it doesn't seem to be available, is it?
I sniffed the packets Google Keep sends and receives while doing speech recognition, and it definitely does NOT use the speech api mentioned above. All the traffic is TLS and (from the outside) it looks pretty much the same as when you're using SpeechRecognizer.
So does perhaps a way exist to kind of "split" (i.e. duplicate, or multiplex) the microphone input stream into two streams, and feed one of them to a SpeechRecognizer and the other to a MediaRecorder?
What Google does record are the voice commands you say to your phone. If you say "OK Google, how old is Jack Black?", Google keeps the recording of your asking the question, plus a few seconds of prior audio.
Using the Google Keep mobile app to voice dictate a note. It will listen to you, grab the text and record the audio. It will then save it as a note complete with selectable text and playable audio within Keep allowing you to then share and organize it in all of the nice ways Google Keep allows you to.
If you have an Android smartphone, you may not know that Google saves all of the voice commands you give it. They're archived online in your Google account. Google says it keeps the audio search information to improve its voice recognition. Android users can opt out, which keeps your recordings anonymous.
Speech-to-Text can process up to 1 minute of speech audio data sent in a synchronous request. After Speech-to-Text processes and recognizes all of the audio, it returns a response. A synchronous request is blocking, meaning that Speech-to-Text must return a response before processing the next request.
Google Keep launches RecognizerIntent
with certain undocumented extras and expects the resulting intent to contain the URI of the recorded audio. If RecognizerIntent
is serviced by Google Voice Search then it all works out and Keep gets the audio.
See record/save audio from voice recognition intent for more information and a code sample that calls the recognizer in the same way as Keep (probably) does.
Note that this behavior is not part of Android. It's simply the current undocumented way of how two closed-source Google apps communicate with each other.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With