I'm generating speech through Google Cloud's text-to-speech API and I'd like to highlight words as they are spoken.
Is there a way of getting timestamps for spoken words or sentences?
Google Speech-to-Text API High accuracy: It has an accuracy rate of 80-85%. Transcription capabilities: It can transcribe audio in 125+ languages and variants, including pre-recorded and real-time audio.
You can dictate text through your voice with Assistant voice typing on Gboard. Punctuation is automatically added as you speak. While you dictate with your voice, you can also tap on your keyboard to type even if the mic is still on. The text you speak stays on your device and isn't sent to Google servers.
Google Docs can transcribe audio to text. This feature is known as voice typing. It is similar to the voice feature on google that allows you to search on the Google engine using your voice.
You can do this using SSML and v1beta1 version of Google Cloud's text-to-speech API: https://cloud.google.com/text-to-speech/docs/reference/rest/v1beta1/text/synthesize#TimepointType
<mark>
SSML tags to the point in the text that you want a timestamp for (maybe at the end of each sentence).SSML_MARK
. If this field is not set, timepoints are not returned by default.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With