Speech to Text from own sound file

Tags:

As you probably know, implementing speech-to-text is pretty easy with the Android API. All you have to do is just call up the API's intent and it will return text for you. My case is a bit different, I have a prerecorded 3GPP sound file that I've recorded from the user and is saved on the SD card. I want to know if it's possible to transcribe that into text like any other speech recognition. Does the speech-to-text API allow for uploading you're own sound files to be processed? Or is this impossible?

851

asked Aug 08 '11 23:08

Brian

2 Answers

The API does not allow it, but see this blog post and its comments for a potential workaround. Also make sure that your file contains high quality audio (at least 16 bit and 16 kHz) to get a better transcription.

Kaarel

I got a solution that is working well to have speech to text from a sound file. Here is the link to a simple Android project I created to show the solution's working. Also, I put some print screens inside the project to illustrate the app.

I'm gonna try to explain briefly the approach I used. I combined two features in that project: Google Speech API and Flac recording.

Google Speech API is called through HTTP connections. Mike Pultz gives more details about the API:

"(...) the new [Google] API is a full-duplex streaming API. What this means, is that it actually uses two HTTP connections- one POST request to upload the content as a “live” chunked stream, and a second GET request to access the results, which makes much more sense for longer audio samples, or for streaming audio."

However, this API needs to receive a FLAC sound file to work properly. That makes us to go to the second part: Flac recording

I implemented Flac recording in that project through extracting and adapting some pieces of code and libraries from an open source app called AudioBoo. AudioBoo uses native code to record and play flac format.

Thus, it's possible to record a flac sound, send it to Google Speech API, get the text, and play the sound that was just recorded.

The project I created has the basic principles to make it work and can be improved for specific situations. In order to make it work in a different scenario, it's necessary to get a Google Speech API key, which is obtained by being part of Google Chromium-dev group. I left one key in that project just to show it's working, but I'll remove it eventually. If someone needs more information about it, let me know cause I'm not able to put more than 2 links in this post.

answered Oct 25 '22 10:10

lsantsan

Related questions
                            
                                Can an android.hardware.camera2.CaptureRequest be used with OpenCV?
                            
                                Android draw ball trail
                            
                                View translation in sliding a ViewPager
                            
                                Activity Lifecycle changed with API 25 (7.1.1)
                            
                                isPowerSaveMode() always returns false for Huawei devices
                            
                                Google +1 button error on android app there was a temporary
                            
                                @StringRes, @DrawableRes, @LayoutRes and so on android annotations lint check with kotlin parameters
                            
                                Better way of checking if socket is connected or disconnected using OkHttp?
                            
                                Delay on the capture of an image - React Native Camera / Expo Camera
                            
                                Tabbar and navigation drawer aligned
                            
                                How to cancel an ongoing notification of another app?
                            
                                How can I create custom switches in android with text on both side of switch's track an in thumb?
                            
                                Why does Android app rolls back to a previous version after device shutdown?
                            
                                Android UI tests with Espresso + MockK crash with SIGSEGV on emulators, fine on physical devices
                            
                                AdMob - Is it mandatory to use the new GDPR forms?
                            
                                Service call backs to activity in android
                            
                                How to toggle orientation lock in android?
                            
                                How to set "android:scrollbars=vertical" programmatically?
                            
                                Difference between android:id, android:name and name tags in Android XML files
                            
                                How can I make a SurfaceView larger than the screen?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Speech to Text from own sound file

Tags:

file

android

audio

speech-to-text

Brian

People also ask

2 Answers

Kaarel

lsantsan

Recent Activity

Donate For Us