Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Cloud Text-to-speech word timestamps

I'm generating speech through Google Cloud's text-to-speech API and I'd like to highlight words as they are spoken.

Is there a way of getting timestamps for spoken words or sentences?

like image 537
user2248702 Avatar asked Mar 24 '19 04:03

user2248702


People also ask

Is Google speech-to-text accurate?

Google Speech-to-Text API High accuracy: It has an accuracy rate of 80-85%. Transcription capabilities: It can transcribe audio in 125+ languages and variants, including pre-recorded and real-time audio.

Can you train Google speech-to-text?

You can dictate text through your voice with Assistant voice typing on Gboard. Punctuation is automatically added as you speak. While you dictate with your voice, you can also tap on your keyboard to type even if the mic is still on. The text you speak stays on your device and isn't sent to Google servers.

Can Google Docs transcribe an audio file?

Google Docs can transcribe audio to text. This feature is known as voice typing. It is similar to the voice feature on google that allows you to search on the Google engine using your voice.


1 Answers

You can do this using SSML and v1beta1 version of Google Cloud's text-to-speech API: https://cloud.google.com/text-to-speech/docs/reference/rest/v1beta1/text/synthesize#TimepointType

  1. Add <mark> SSML tags to the point in the text that you want a timestamp for (maybe at the end of each sentence).
  2. Set TimepointType to SSML_MARK. If this field is not set, timepoints are not returned by default.
like image 187
i_am_momo Avatar answered Sep 22 '22 02:09

i_am_momo