Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding a pause in Google-text-to-speech

I am looking for a small pause, wait, break or anything that will allow for a short break (looking for about 2 seconds +-, configurable would be ideal) when speaking out the desired text.

People online have said that adding three full stops followed by a space creates a break but I don't seem to be getting that. Code below is my test that has no pauses, sadly.. Any ideas or suggestions?

Edit: It would be ideal if there is some command from gTTS that would allow me to do this, or maybe some trick like using the three full stops if that actually worked.

from gtts import gTTS
import os

tts = gTTS(text=" Testing ... if there is a pause ... ... ... ... ...  longer pause? ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... insane pause   " , lang='en', slow=False)

tts.save("temp.mp3")
os.system("temp.mp3")

like image 231
Barry Sturgeon Avatar asked Jan 20 '20 09:01

Barry Sturgeon


People also ask

What is Ssml tag?

<speak> The <speak> tag is the root element of all Amazon Polly SSML text. All SSML-enhanced text must be enclosed within a pair of <speak> tags. <speak>Mary had a little lamb.</speak> This tag is supported by both neural and standard TTS formats.


3 Answers

Ok, you need Speech Synthesis Markup Language (SSML) to achieve this.
Be aware you need to setting up Google Cloud Platform credentials

first in the bash:

pip install --upgrade google-cloud-texttospeech

Then here is the code:

import html
from google.cloud import texttospeech

def ssml_to_audio(ssml_text, outfile):
    # Instantiates a client
    client = texttospeech.TextToSpeechClient()

    # Sets the text input to be synthesized
    synthesis_input = texttospeech.SynthesisInput(ssml=ssml_text)

    # Builds the voice request, selects the language code ("en-US") and
    # the SSML voice gender ("MALE")
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
    )

    # Selects the type of audio file to return
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )

    # Performs the text-to-speech request on the text input with the selected
    # voice parameters and audio file type
    response = client.synthesize_speech(
        input=synthesis_input, voice=voice, audio_config=audio_config
    )

    # Writes the synthetic audio to the output file.
    with open(outfile, "wb") as out:
        out.write(response.audio_content)
        print("Audio content written to file " + outfile)

def text_to_ssml(inputfile):

    raw_lines = inputfile

    # Replace special characters with HTML Ampersand Character Codes
    # These Codes prevent the API from confusing text with
    # SSML commands
    # For example, '<' --> '&lt;' and '&' --> '&amp;'

    escaped_lines = html.escape(raw_lines)

    # Convert plaintext to SSML
    # Wait two seconds between each address
    ssml = "<speak>{}</speak>".format(
        escaped_lines.replace("\n", '\n<break time="2s"/>')
    )

    # Return the concatenated string of ssml script
    return ssml



text = """Here are <say-as interpret-as="characters">SSML</say-as> samples.
  I can pause <break time="3s"/>.
  I can play a sound"""

ssml = text_to_ssml(text)
ssml_to_audio(ssml, "test.mp3")

More documentation:
Speaking addresses with SSML

But if you don't have Google Cloud Platform credentials, the cheaper and easier way is to use time.sleep(1) method

like image 136
Peyman Majidi Avatar answered Oct 19 '22 15:10

Peyman Majidi


If there is any background waits required, you can use the time module to wait as below.

import time
# SLEEP FOR 5 SECONDS AND START THE PROCESS
time.sleep(5)

Or you can do a 3 time check with wait etc..

import time

for tries in range(3):
    if someprocess() is False:
        time.sleep(3)
like image 29
High-Octane Avatar answered Oct 19 '22 14:10

High-Octane


You can save multiple mp3 files, then use time.sleep() to call each with your desired amount of pause:

from gtts import gTTS
import os
from time import sleep

tts1 = gTTS(text="Testingn" , lang='en', slow=False)
tts2 = gTTS(text="if there is a pause" , lang='en', slow=False)
tts3 = gTTS(text="insane pause   " , lang='en', slow=False)

tts1.save("temp1.mp3")
tts2.save("temp2.mp3")
tts3.save("temp3.mp3")

os.system("temp1.mp3")
sleep(2)
os.system("temp2.mp3")
sleep(3)
os.system("temp3.mp3")
like image 1
Ann Zen Avatar answered Oct 19 '22 16:10

Ann Zen