Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Synthesize more than 1500 characters using AWS Polly?

My idea was to use AWS Polly to read aloud some news from an RSS feed. As per this link I understand that Polly is very flexible in terms of characters to be converted as one of the examples is "Adventures of Huckelberry Finn" by Mark Twain ~600k characters The problem is that when I try to convert my articles to speech I am getting the following error:

An error occurred (TextLengthExceededException) when calling the SynthesizeSpeech operation: Maximum text length has been exceeded

The text I was trying to convert was about 5000 characters.

Is there any way (with or without the API) to convert long strings of text with Polly without having to cut them into million different pieces?

Any tip in the right direction will be appreciated,

Thanks

like image 503
JordanBelf Avatar asked Dec 25 '16 01:12

JordanBelf


People also ask

What is the purpose of Speech Synthesis Markup Language Ssml in Amazon Polly?

Speech Synthesis Markup Language (SSML) is a standardized markup language that enables developers to modify Text-to-Speech (TTS) audio. With SSML, you can control various vocal characteristics of TTS output, such as pronunciation, speech rate, and other elements, to produce a more natural-sounding voice experience.

How long is Amazon Polly?

The generated audio can be up to 6 hours long, and is typically ready within minutes. In addition to 100,000 characters of text, each request can include an additional 100,000 characters of Speech Synthesis Markup Language (SSML) markup.

Which AWS service can convert text to lifelike speech?

Text-to-Speech on AWSAmazon Polly is an API-driven service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. It provides dozens of lifelike voices across a wide variety of languages.


3 Answers

The size of the input text can be up to 1500 billed characters (3000 total characters). SSML tags are not counted as billed characters.

http://docs.aws.amazon.com/polly/latest/dg/limits.html

The pricing examples seem to be intended to give a sense of the relatively low cost of voicing a large work, but the work would actually need to be divided into groups of sentences and submitted to the API, which is the only interface -- the SDKs and CLI call the same SynthesizeSpeech API.

like image 190
Michael - sqlbot Avatar answered Oct 17 '22 10:10

Michael - sqlbot


How to create long audio files is described in the docs: https://docs.aws.amazon.com/polly/latest/dg/longer-cli.html

An aws-CLI call might look like this:

aws polly start-speech-synthesis-task \
--region eu-central-1 \
--endpoint-url "https://polly.eu-central-1.amazonaws.com/" \
--output-format mp3 \
--output-s3-bucket-name your-bucket-name \
--output-s3-key-prefix optional/prefix/path/file \
--voice-id Hans \
--text-type ssml \
--text file://output.xml \
--speech-mark-types='["sentence", "word", "ssml"]' \

As you can see you will need an S3-bucket for (temporay) storage.

like image 2
user3072843 Avatar answered Oct 17 '22 12:10

user3072843


I have no special tip without breaking a text in pieces, but i wrote an article with the way to do it in NodeJS. If you don't have any other alternative, feel free to review and comment it !

How to handle more than 1500 characters with AWS Polly text-to-speech

like image 1
Jonathan Banon Avatar answered Oct 17 '22 11:10

Jonathan Banon