I want to use SSML markers through the Google Cloud text-to-speech API to request the timing of these markers in the audio stream. These timestamps are necessary in order to provide cues for effects, word/section highlighting and feedback to the user. I found this question which is relevant, although the question refers to the timestamps for each word and not the SSML <code></code> tag. The following API request returns OK but shows the lack of the requested marker data. This is using the <code>Cloud Text-to-Speech API v1</code>. <pre class="prettyprint"><code>{ "voice": { "languageCode": "en-US" }, "input": { "ssml": "<speak>First, second, third.</speak>" }, "audioConfig": { "audioEncoding": "mp3" } } </code></pre> Response: <pre class="prettyprint"><code>{ "audioContent":"//NExAAAAANIAAAAABcFAThYGJqMWA..." } </code></pre> Which only provides the synthesized audio without any contextual information. Is there an API request that I am overlooking which can expose information about these markers such as is the case with IBM Watson and Amazon Polly?

Looks like this is supported in <code>Cloud Text-to-Speech API v1beta1</code>: https://cloud.google.com/text-to-speech/docs/reference/rest/v1beta1/text/synthesize#TimepointType You can use <code>https://texttospeech.googleapis.com/v1beta1/text:synthesize</code>. Set <code>TimepointType</code> to <code>SSML_MARK</code>. If this field is not set, timepoints are not returned by default.

How to get SSML timestamps from Google Cloud text-to-speech API

Q: What is SSML syntax?

Tags:

markers

google-text-to-speech

google-cloud-speech

ssml

I want to use SSML markers through the Google Cloud text-to-speech API to request the timing of these markers in the audio stream. These timestamps are necessary in order to provide cues for effects, word/section highlighting and feedback to the user.

I found this question which is relevant, although the question refers to the timestamps for each word and not the SSML  tag.

The following API request returns OK but shows the lack of the requested marker data. This is using the Cloud Text-to-Speech API v1.

{
 "voice": {
  "languageCode": "en-US"
 },
 "input": {
  "ssml": "<speak>First, <mark name=\"a\"/> second, <mark name=\"b\"/> third.</speak>"
 },
 "audioConfig": {
  "audioEncoding": "mp3"
 }
}

Response:

{
 "audioContent":"//NExAAAAANIAAAAABcFAThYGJqMWA..."
}

Which only provides the synthesized audio without any contextual information.

Is there an API request that I am overlooking which can expose information about these markers such as is the case with IBM Watson and Amazon Polly?

787

asked Aug 06 '19 18:08

James

Video Answer

2 Answers

At the time of writing, the timepoint data is available in the v1beta1 release of Google cloud text-to-speech.

I didn't need to sign on to any extra developer program in order to access the beta, beyond the default access.

Importing in Python (for example) went from:

from google.cloud import texttospeech as tts

to:

from google.cloud import texttospeech_v1beta1 as tts

Nice and simple.

I needed to modify the default way I was sending the synthesis request to include the enable_time_pointing flag.

I found that with a mix of poking around the machine-readable API description here and reading the Python library code, which I had already downloaded.

Thankfully, the source in the generally available version also includes the v1beta version - thank you Google!

I've put a runnable sample below. Running this needs the same auth and setup you'll need for a general text-to-speech sample, which you can get by following the official documentation.

Here's what it does for me (with slight formatting for readability):

$ python tools/try-marks.py
Marks content written to file: .../demo.json
Audio content written to file: .../demo.mp3

$ cat demo.json
[
  {"sec": 0.4300000071525574, "name": "here"},
  {"sec": 0.9234582781791687, "name": "there"}
]

Here's the sample:

import json
from pathlib import Path
from google.cloud import texttospeech_v1beta1 as tts


def go_ssml(basename: Path, ssml):
    client = tts.TextToSpeechClient()
    voice = tts.VoiceSelectionParams(
        language_code="en-AU",
        name="en-AU-Wavenet-B",
        ssml_gender=tts.SsmlVoiceGender.MALE,
    )

    response = client.synthesize_speech(
        request=tts.SynthesizeSpeechRequest(
            input=tts.SynthesisInput(ssml=ssml),
            voice=voice,
            audio_config=tts.AudioConfig(audio_encoding=tts.AudioEncoding.MP3),
            enable_time_pointing=[
                tts.SynthesizeSpeechRequest.TimepointType.SSML_MARK]
        )
    )

    # cheesy conversion of array of Timepoint proto.Message objects into plain-old data
    marks = [dict(sec=t.time_seconds, name=t.mark_name)
             for t in response.timepoints]

    name = basename.with_suffix('.json')
    with name.open('w') as out:
        json.dump(marks, out)
        print(f'Marks content written to file: {name}')

    name = basename.with_suffix('.mp3')
    with name.open('wb') as out:
        out.write(response.audio_content)
        print(f'Audio content written to file: {name}')


go_ssml(Path.cwd() / 'demo', """
    <speak>
    Go from <mark name="here"/> here, to <mark name="there"/> there!
    </speak>
    """)

122

answered Oct 11 '22 12:10

Andrew E

Looks like this is supported in Cloud Text-to-Speech API v1beta1: https://cloud.google.com/text-to-speech/docs/reference/rest/v1beta1/text/synthesize#TimepointType

You can use https://texttospeech.googleapis.com/v1beta1/text:synthesize. Set TimepointType to SSML_MARK. If this field is not set, timepoints are not returned by default.

answered Oct 11 '22 12:10

i_am_momo

Related questions
                            
                                How to replicate marker position on map loop Leaflet JS
                            
                                iterating markers in plots
                            
                                find latitude & longitude of saved marker in leaflet
                            
                                Change z-index of marker in openlayers
                            
                                show list of "raw type" warning in IntelliJ IDEA 10
                            
                                How to set a title above each marker which represents a same label
                            
                                Google maps GeoJSON- toggle marker layers?
                            
                                conditional marker colors in highcharts
                            
                                How to change marker size in seaborn.catplot
                            
                                Scaling an arrowhead on a D3 force layout link marker
                            
                                android maps: how to determine map center after a drag has been completed
                            
                                How to customize marker colors and shapes in scatter plot? [duplicate]
                            
                                How to add marker for a map view in google maps sdk for ios in swift
                            
                                Android Mapview: Merging overlapping markers into a new marker
                            
                                Google maps api dot marker
                            
                                Android google maps with 3D markers?
                            
                                How to exclude files/folder from Markers view in eclipse?
                            
                                py.test : Can multiple markers be applied at the test function level?
                            
                                Markers are not visible in seaborn plot
                            
                                pyodbc the sql contains 0 parameter markers but 1 parameters were supplied' 'hy000'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get SSML <mark> timestamps from Google Cloud text-to-speech API