Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Google's Text-to-Speech API in Python

My key is ready to go to make requests and get speech from text from Google.
I tried these commands and many more.
The docs offer no straight forward solutions to getting started with Python that I've found. I don't know where my API key goes along with the JSON and URL

One solution in their docs here is for CURL.. But involves downloading a txt after the request that has to be sent back to them in order to get the file. Is there a way to do this in Python that doesn't involve the txt I have to return them? I just want my list of strings returned as audio files.

My Code

(I put my actual key in the block above. I'm just not going to share it here.)

like image 562
Ant Avatar asked Dec 24 '22 01:12

Ant


1 Answers

Configure Python App for JSON file and Install Client Library

  1. Create a Service Account
  2. Create a Service Account Key using the Service Account here
  3. The JSON file downloads and save it securely
  4. Include the Google Application Credentials in your Python App
  5. Install the library: pip install --upgrade google-cloud-texttospeech

Using Google's Python examples found: https://cloud.google.com/text-to-speech/docs/reference/libraries Note: In Google's example it is not including the name parameter correctly. and https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/texttospeech/cloud-client/quickstart.py

Below is the modified from the example using google app credentials and wavenet voice of a female.

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/home/yourproject-12345.json"

from google.cloud import texttospeech

# Instantiates a client
client = texttospeech.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Do no evil!")

# Build the voice request, select the language code ("en-US") 
# ****** the NAME
# and the ssml voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
    language_code='en-US',
    name='en-US-Wavenet-C',
    ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)

# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
    audio_encoding=texttospeech.enums.AudioEncoding.MP3)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)

# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')

Voices,Name, Language Code, SSML Gender, Etc

List of Voices: https://cloud.google.com/text-to-speech/docs/voices

In the above code example I changed the voice from Google's example code to include the name parameter and to use the Wavenet voice (much improved but more expensive $16/million chars) and the SSML Gender to FEMALE.

voice = texttospeech.types.VoiceSelectionParams(
        language_code='en-US',
        name='en-US-Wavenet-C',
        ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
like image 173
CodeRaptor Avatar answered Dec 25 '22 14:12

CodeRaptor