I would like to be able to test which text-to-speech voices are available for my iOS app to use with AVSpeechSynthesis. It is easy to generate a list of the installed voices, but Apple makes some of them are off-limits for use by apps, and I would like to know which.
For example, consider the following test code (swift 5.1):
import AVFoundation
...
func voiceTest() {
let speechSynthesizer = AVSpeechSynthesizer()
let voices = AVSpeechSynthesisVoice.speechVoices()
for voice in voices where voice.language == "en-US" {
print("\(voice.language) - \(voice.name) - \(voice.quality.rawValue) [\(voice.identifier)]")
let phrase = "The voice you're now listening to is the one called \(voice.name)."
let utterance = AVSpeechUtterance(string: phrase)
utterance.voice = voice
speechSynthesizer.speak(utterance)
}
}
When I call voiceTest()
, the console output is this:
en-US - Nicky (Enhanced) - 2 [com.apple.ttsbundle.siri_female_en-US_premium]
en-US - Aaron - 1 [com.apple.ttsbundle.siri_male_en-US_compact]
en-US - Fred - 1 [com.apple.speech.synthesis.voice.Fred]
en-US - Nicky - 1 [com.apple.ttsbundle.siri_female_en-US_compact]
en-US - Samantha - 1 [com.apple.ttsbundle.Samantha-compact]
en-US - Alex - 2 [com.apple.speech.voice.Alex]
Some of the voices speak in their actual voice, whereas some of them speak in the default voice instead. In my case both Nicky (com.apple.ttsbundle.siri_female_en-US_premium) and Alex (com.apple.speech.voice.Alex) are listed as high quality but sound instead like the low quality default, Samantha, when selected.
I know that Apple has said that the Siri voices are not available for use in third party apps. When I manually load Samantha (High Quality) on my iPhone via Settings, it appears in the list and I can use it. Perhaps Alex is just the high-quality male Siri voice, even though Aaron would seem to be the low-quality Siri voice based on its identifier (com.apple.ttsbundle.siri_male_en-US_compact)? And that's why Alex and Nicky are the only two to be unavailable? So that if I have my app specifically exclude those it will generate the true list of available voices? It would be nice to have some clarity.
Go to Settings > Accessibility and tap Spoken Content. Turn on Speak Selection or Speak Screen, or both. Select Voices. Choose the voice and dialect that you want Speak Screen and Speak Selection to use.
In the "Accessibility" section, select Manage accessibility features. Open Select-to-speak settings. Customize your Select-to-speak voice: Change the language and preferred voice: Under “Speech," choose the language and type of voice you want to hear.
I've been looking for a way to programmatically use Siri's nice sounding voice, such as English Siri Male (United States), and quickly discovered it is not possible using public Speech API even though the voice can be selected in System Preferences.
To answer your question, there are at least two other ways of finding available voices in addition to your code example.
Using defaults
command
defaults read com.apple.speech.voice.prefs > speech_prefs.txt
To find info on voice currently selected in System Preference, look for SelectedVoiceName
in speech_prefs.txt
.
For example, for English Siri Male (United States), this will be SelectedVoiceName = "Aaron Siri";
.
Now, by further searching for aaron
in speech_prefs.txt
, you will find the following:
"VOICEID:com.apple.speech.synthesis.voice.custom.siri.aaron.premium_1" = {
BundleIdentifier = "com.apple.speech.synthesis.voice.custom.siri.aaron.premium";
I tried both of these strings when initializing voice, but got error saying voice is not found.
Looking for voice directories
There seems to be three locations:
/System/Library/Speech/Voices
,
/Library/Speech/Voices
and
~/Library/Speech/Voices
The third one seems to be a location for custom voices.
Each voice has its own directory.
If you compare Info.plist
files of some programmatically available and programmatically unavailable voices, you will see that both have different structure. For example, the programmatically unavailable voice lacks some attributes that correspond to Speech API, such as VoiceSupportedCharacters
. I believe this is because some voices are of the older generation and some are newer.
P.S.
Not directly relevant to your question, but just FYI: I'm still looking for a solution to use Siri's voice programmatically. One idea is to make a copy of the voice directory and play with its Info.plist
. The other idea is to automate MacOS UI to trigger text-to-speech conversion through simulating key press bound to Speak selected text when the key is pressed
option in System Preferences / Accessibility / Speech and then recording the audio.
I'd appreciate if anyone can share other ideas.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With