In general, I'm very impressed with Android's default text to speech engine (i.e., com.svox.pico). As expected, it mispronounces some words (as do I) and it therefore occasionally needs some pronunciation guidance. So I'm wondering about best practices for phonetically spelling out those words that the pico TTS engine mispronounces.
For example, the correct pronunciation of the bird Chachalaca is CHAH-chah-LAH-kah. Here is what the TTS engine produces:
mTts.speak("Chachalaca", TextToSpeech.QUEUE_ADD, null); // output: chuh-KAL-uh-KUH
mTts.speak("CHAH-chah-LAH-kah", TextToSpeech.QUEUE_ADD, null); // output: CHAH-chah-EL-AY-AYCH-dash-kuh
mTts.speak("CHAHchahLAHkah", TextToSpeech.QUEUE_ADD, null); // output: CHA-chah-LAH-ka
mTts.speak("CHAH chah LOCKah", TextToSpeech.QUEUE_ADD, null); // output: CHAH-chah-LAH-kah
Here are my questions.
By the way, this is what the TTS engine writes to logcat:
V/TtsService( 294): TTS processing: CHAH chah LOCKah
V/TtsService( 294): TtsService.setLanguage(eng, USA, )
I/SVOX Pico Engine( 294): Language already loaded (en-US == en-US)
I/SynthProxy( 294): setting speech rate to 100
I/SynthProxy( 294): setting pitch to 100
[UPDATE]
I tried passing an XML document to TextToSpeech.speak() as follows:
String text = "<?xml version=\"1.0\"?>" +
"<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" " +
"xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" " +
"xsi:schemaLocation=\"http://www.w3.org/2001/10/synthesis " +
"http://www.w3.org/TR/speech-synthesis/synthesis.xsd\" " +
"xml:lang=\"en-US\">" +
"That is a big car! " +
"That <emphasis>is</emphasis> a big car! " +
"That is a <emphasis>big</emphasis> car! " +
"That is a huge bank account! " +
"That <emphasis level=\"strong\">is</emphasis> a huge bank account! " +
"That is a <emphasis level=\"strong\">huge</emphasis> bank account!" +
"</speak>";
mTts.speak(text, TextToSpeech.QUEUE_ADD, null);
As Android Eve suggested, the TTS engine read only the XML body (i.e., the comments about the big car and the huge bank account). I didn't realize the TTS engine was capable of parsing XML documents. However, I did not hear any emphasis in the TTS output.
[UPDATE 2]
I simplified the question to whether or not Android TTS supports Speech Synthesis Markup Language here.
JW answered my question at the tts-for-android group:
Hi Greg,
The Pico engine recognizes the tag with the XSAMPA alphabet.
There are no easy rules to derive a certain pronunciation from the orthograpy, but you can use intuitive spellings and trial and error. Capitalizing and hyphens will introduce more problems than solving them. Using different spellings and introducing extra word boundaries (spaces) can work.
The emphasis tag and the exclamation mark will not change the synthesis result. Use , , and commands instead.
Some examples of the proper syntax for specifying the pronunciation using the SSML phoneme tag are in these tests of TextToSpeech.
Even with these simple test SSML documents, there are warning messages posted to logcat about the SSML document not being well-formed. So I opened an issue about these seemingly incorrect logcat messages to the Android issue tracker.
The syntax for specifying an x-SAMPA sequence to SVOX pico is
String text = "<speak xml:lang=\"en-US\"> <phoneme alphabet=\"xsampa\" ph=\"d_ZIn\"/>.</speak>";
mTts.speak(text, TextToSpeech.QUEUE_ADD, null);
Although more examples would be helpful, a good reference for x-SAMPA is at http://en.wikipedia.org/wiki/Xsampa If I compile a couple dozen examples, I'll post them to that Wikipedia page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With