I'm developing an iOS app that does voice based AI; i.e. it's meant to take voice input from the microphone, turn it into text, send it to an AI agent, then output the returned text through the speaker. I've got everything working, though using a button to start and stop recording the speech (SpeechKit for voice recognition, API.AI for the AI, Amazon's Polly for the output).
The piece that I need is to have the microphone always on and to automatically start and stop the recording of the user's voice as they begin and end talking. This app is being developed for an unorthodox context, where there will be no access to the screen for the user (but they will have a high-end shotgun mic for recording their text).
My research suggests this piece of the puzzle is known as 'Voice Activity Detection' and seems to be one of the hardest steps in the whole voice-based AI system.
I'm hoping someone can either supply some straightforward (Swift) code to implement this myself, or point me in the direction of some decent libraries / SDKs that I can implement in this project.
Go to Settings > Accessibility > Voice Control. Tap Set Up Voice Control, then tap Continue to start the file download. appears in the status bar to indicate Voice Control is turned on.
With Dictation on iPhone, you can dictate text anywhere you can type it. You can also use typing and Dictation together—the keyboard stays open during Dictation so you can easily switch between voice and touch to enter text. For example, you can select text with touch and replace it with your voice.
Your iPhone or iPad will ask you to train Siri to recognize your voice. Tap continue and follow the steps by saying “Hey Siri” three times when prompted to do so.
For good VAD algorithm implementation you can use py-webrtcvad.
It is a Python interface for C code, you can just import C files from the project and use them from swift.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With