I have a program that receives an audio (mono) stream of bits from TCP/IP. I am wondering whether the speech (speech-recognition) API in Mac OS X would be able to do a speech-to-text transform for me.
(I don't mind saving the audio into .wav first and read it as oppose to do the transform on the fly).
I have read the official docs online, it is a bit confusing. And I couldn't find any good example about this topic.
Also, should I do it in Cocoa/Carbon/Java or Objective-C?
Can someone please shed some light?
Thanks.
On your Mac, choose Apple menu > System Preferences, click Keyboard , then click Dictation. Click the Shortcut pop-up menu, then choose a shortcut to start Dictation. To create a shortcut that's not in the list, choose Customize, then press the keys you want to use. For example, you could press Option-Z.
Mac Laptops and Desktops have an in-built tool called Dictation that allows you to write up documents with your voice. Its key feature is: You can generate text from your voice easily on a Mac laptop or Desktop. When you switch this feature, you can speak out your written notes and convert them into a document.
Support for the Web Speech API, Safari can convert text to speech and vice versa, allowing developers to create accessible, voice-driven web apps."
Voice Control is available in macOS Catalina and later. It's a new way to fully control your Mac entirely with your voice. Voice Control uses the Siri speech-recognition engine to improve on the Enhanced Dictation feature available in earlier versions of macOS.
There's a number of examples that get copied under /Developer/Examples/Speech/Recognition when you install XCode.
Cocoa class for speech recognition is NSSpeechRecognizer. I've not used it but as far as I know speech recognition requires you to build a grammar to help the engine choose from a number of choices rather then allowing you to pass free-form input. This is all explained in the examples referred above.
This comes a bit late perhaps, but I'll chime in anyway.
The speech recognition facilities in OS X (on both the Carbon and Cocoa side of things) are for speech command recognition, which means that they will recognize words (or phrases, commands) that have been loaded into the speech system language model. I've done some stuff with small dictionaries and it works pretty well, but if you want to recognize arbitrary speech things may turn hairier.
Something else to keep in mind is that the functionality that the speech APIs in OS X provide is not one to one. The Carbon stuff provides functionality that has not made it to NSSpeechRecognizer
(the docs make some mention of this).
I don't know about Cocoa, but the Carbon Speech Recognition Manager does allow you to specify inputs other than a microphone so a sound stream would work just fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With