I have a program that receives an audio (mono) stream of bits from TCP/IP. I am wondering whether the speech (speech-recognition) API in Mac OS X would be able to do a speech-to-text transform for me. (I don't mind saving the audio into .wav first and read it as oppose to do the transform on the fly). I have read the official docs online, it is a bit confusing. And I couldn't find any good example about this topic. Also, should I do it in Cocoa/Carbon/Java or Objective-C? Can someone please shed some light? Thanks.

This comes a bit late perhaps, but I'll chime in anyway. The speech recognition facilities in OS X (on both the Carbon and Cocoa side of things) are for speech command recognition, which means that they will recognize words (or phrases, commands) that have been loaded into the speech system language model. I've done some stuff with small dictionaries and it works pretty well, but if you want to recognize arbitrary speech things may turn hairier. Something else to keep in mind is that the functionality that the speech APIs in OS X provide is not one to one. The Carbon stuff provides functionality that has not made it to <code>NSSpeechRecognizer</code> (the docs make some mention of this). I don't know about Cocoa, but the Carbon Speech Recognition Manager does allow you to specify inputs other than a microphone so a sound stream would work just fine.

Mac OS X speech to text API. Howto?

2 Answers

There's a number of examples that get copied under /Developer/Examples/Speech/Recognition when you install XCode.

Cocoa class for speech recognition is NSSpeechRecognizer. I've not used it but as far as I know speech recognition requires you to build a grammar to help the engine choose from a number of choices rather then allowing you to pass free-form input. This is all explained in the examples referred above.

128

answered Oct 30 '22 01:10

diciu

This comes a bit late perhaps, but I'll chime in anyway.

The speech recognition facilities in OS X (on both the Carbon and Cocoa side of things) are for speech command recognition, which means that they will recognize words (or phrases, commands) that have been loaded into the speech system language model. I've done some stuff with small dictionaries and it works pretty well, but if you want to recognize arbitrary speech things may turn hairier.

Something else to keep in mind is that the functionality that the speech APIs in OS X provide is not one to one. The Carbon stuff provides functionality that has not made it to NSSpeechRecognizer (the docs make some mention of this).

I don't know about Cocoa, but the Carbon Speech Recognition Manager does allow you to specify inputs other than a microphone so a sound stream would work just fine.

answered Oct 30 '22 02:10

Latrokles

Related questions
                            
                                Previously selected UICollectionViewCells are deselected after non-selectable cell is tapped
                            
                                Core data find-or-create most efficient way
                            
                                Grouped UITableView 35 point / pixel top inset / padding
                            
                                Using typeof(self) in Objective-C blocks to declare a strong reference
                            
                                How to suppress "macro redefined" warning in Objective-C
                            
                                iOS UITextField Auto Resize conform to the content
                            
                                How to play PCM data/buffer just using AVAudioPlayer or AVPlayer?
                            
                                UIView animateWithDuration completion has finished = YES despite being cancelled?
                            
                                Apple Watch App Fails Submission Invalid Info.plist and Icon
                            
                                How to output a CIFilter to a Camera view?
                            
                                iOS 9 - NSUserActivity userinfo property showing null
                            
                                Autoscroll smoothly UITableView while dragging UITableViewCells in iOS app
                            
                                UISearchController Search Bar Position Drops 64 points
                            
                                Unable to connect to FCM. Error Domain=com.google.fcm Code=2001
                            
                                Calling original function from swizzled function
                            
                                UIAlertController how add the tag value in obj c
                            
                                Best way to display a game score on iPhone with cocos2d?
                            
                                How build a custom control in Xcode for the iPhone SDK?
                            
                                Cost of Including Header Files in Objective-C
                            
                                iPhone and Crypto Libraries

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Mac OS X speech to text API. Howto?

Tags:

macos

objective-c

cocoa

audio

speech-recognition

Roy Chan

People also ask

2 Answers

diciu

Latrokles

Recent Activity

Donate For Us