Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Real-time call transcription on Android

I am an Android developer who is living with hearing impairment and I am currently exploring the option of making a speech to text app with Speech Recognizer API in Android. Closed-captioning telephones and Innocaption are not available in in my home country. Potential applications might be like captioning during telephone calls.

https://developer.android.com/reference/android/speech/SpeechRecognizer.html

The API is meant for capturing voice commands, not for real-time live transcribing. I am even able to implement it as a service but I constantly need to restart it after it has delivered a result or a partial result, which is not feasible in a conversational setting (words get lost while the service is restarting).

Do note that I don't need a 100% accuracy for this app. Many hearing impaired people find it helpful to have some context of the conversation to help them along. So I don't actually need comments about how this is not going to be accurate.

Is there a way to implement Speech Recognizer in a continuous mode? I can create a textview that constantly updates itself when new text is returned from the service. If this API is not what I should be looking at, is there any recommendation? I tested CMUSphinx but find that it is too dependent on blocks of phrases/sentences that it is not likely to work for the kind of application I have in mind.

like image 245
Lorteld Avatar asked Sep 12 '14 03:09

Lorteld


People also ask

Can you use Live Transcribe for phone calls?

Live Transcribe is already installed on Pixel phones and select other Android phones. To turn on Live Transcribe, follow these steps: Open your device's Settings app . Tap Accessibility, then tap Live Transcribe.

What is Live Transcribe on my Android phone?

Live Transcribe is a smartphone application to get realtime captions developed by Google for the Android operating system. Development on the application began in partnership with Gallaudet University. It was publicly released as a free beta for Android 5.0+ on the Google Play Store on February 4, 2019.

Is Google Live Transcribe free?

Live Transcribe is easy to use, all you need is a Wi-Fi or network connection. It's free of charge to download on over 1.8 billion Android devices operating on 5.0 Lollipop and above.


2 Answers

I am a deaf software developer, so I can chime in. I've been monitoring the state of art of Speech-To-Text APIs, and the APIs have now become "good enough" to provide operatorless relay/captioning services for CERTAIN kinds of phone conversations with people using telephone in quiet settings. For example, I get 98% transcription accuracy with my spouse's voice with the Apple Siri realtime transcription (iOS 8).

I was able to jerryrig phone captioning by routing the sound out of one phone, to a 2nd iPhone that I press the microphone button (popup keyboard), and successfully captioned a telephone conversation with ~95% accuracy at 250 words per minute (faster than Sprint Captioned Telephone and Hamilton Captioned Telephone), at least until the 1 minute cutoff time.

Thusly, I declare computer-based voice recognition practical for phone calls with family members (of the type you call frequently in quiet environments), where you can at least coach them to move to a quiet place to allow captioning to work properly (with >95% accuracy). Since iOS 8 got released, we REALLY need this, so we don't need to rely on rely operators or captioning telephone. Sprint Captioned telephone lags badly during fast speech, while Apple Siri keeps up, so I can conduct more natural telephone conversations with my jerryrigged two-iOS-device Apple Siri "realtime Captioned Telephone" setup.

Some cellphones transmit audio in a higher-def manner, so it works well between two iPhones (iPhone speaker piped into another iPhone's Siri running in iOS8 continuous mode). That's assuming you're on G.722.2 (AMR-WB), like when running two iPhones on the same carrier that supports the high-def audio telephony standard. It works perfectly when piped through Siri -- roughly as good as doing it in front of the phone, for the same human voice (assuming the other end is speaking into the phone in a quiet environment).

Google and Apple needs to open up their speech-to-text APIs to assistive applications, pronto, because operatorless telephone transcription is finally now practical, at least when calling family members (good voices & coached to be in a quiet environment when receiving call). The continuous recognition time limit needs to also be removed during this situation, too.

like image 131
Mark Rejhon Avatar answered Oct 23 '22 23:10

Mark Rejhon


Google is not going to work with telephone quality audio anyway, you need to work on captioning service using CMUSphinx yourself.

You probably didn't configure CMUSphinx properly, it should be ok for large vocabulary transcription, the only thing you should care about is to use telephony 8khz model, not wideband model and generic language model.

For the best accuracy it's probably worth to move processing on the server, you can setup the PBX to make the calls and transcribe audio there instead of hoping to do something on a limited device.

like image 1
Nikolay Shmyrev Avatar answered Oct 23 '22 23:10

Nikolay Shmyrev