I'm generating speech through Google Cloud's text-to-speech API and I'd like to highlight words as they are spoken. Is there a way of getting timestamps for spoken words or sentences?

You can do this using SSML and v1beta1 version of Google Cloud's text-to-speech API: https://cloud.google.com/text-to-speech/docs/reference/rest/v1beta1/text/synthesize#TimepointType <ol> <li>Add <code></code> SSML tags to the point in the text that you want a timestamp for (maybe at the end of each sentence).</li> <li>Set TimepointType to <code>SSML_MARK</code>. If this field is not set, timepoints are not returned by default.</li> </ol>

Google Cloud Text-to-speech word timestamps

1 Answers

You can do this using SSML and v1beta1 version of Google Cloud's text-to-speech API: https://cloud.google.com/text-to-speech/docs/reference/rest/v1beta1/text/synthesize#TimepointType

Add  SSML tags to the point in the text that you want a timestamp for (maybe at the end of each sentence).
Set TimepointType to SSML_MARK. If this field is not set, timepoints are not returned by default.

187

answered Sep 22 '22 02:09

i_am_momo

Related questions
                            
                                How can I control how Android TTS plays audio
                            
                                Does Android TTS support Speech Synthesis Markup Language?
                            
                                What is the default audio stream of TTS?
                            
                                C# Save text to speech to MP3 file
                            
                                How do I use a lexicon with SpeechSynthesizer?
                            
                                Text-to-speech (voice generation) and speech-to-text (voice recognition) APIs?
                            
                                A good Text-to-Speech JavaScript library [closed]
                            
                                Android TTS onUtteranceCompleted callback isn't getting called
                            
                                Google's text-to speech engine voices?
                            
                                How to Programmatically Change TTS Default Engine
                            
                                Why doesn't UtteranceProgress Listener get called on Text to Speech?
                            
                                How to use Google Translate TTS with the new V2 API?
                            
                                Development of application similar to "Google now" - is it possible to use voice recognition without key input?
                            
                                System.Speech.Synthesis hangs with high CPU on 2012 R2
                            
                                How to create custom text-to-speech engine
                            
                                best practice for specifying pronunciation for Android TTS engine?
                            
                                Swift text to speech
                            
                                Does iOS provide built in text to speech support or any class like NSSpeechRecognizer?
                            
                                Droid Accessibility - Disable TalkBack for specific TextView
                            
                                SpeechSynthesis.speak (in Web Speech API) always stops after a few seconds in Google Chrome

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Google Cloud Text-to-speech word timestamps

Tags:

text-to-speech

google-text-to-speech

speech-synthesis

user2248702

People also ask

1 Answers

i_am_momo

Recent Activity

Donate For Us