My idea was to use AWS Polly
to read aloud some news from an RSS
feed. As per this link I understand that Polly is very flexible in terms of characters to be converted as one of the examples is "Adventures of Huckelberry Finn" by Mark Twain ~600k characters
The problem is that when I try to convert my articles to speech I am getting the following error:
An error occurred (TextLengthExceededException) when calling the SynthesizeSpeech operation: Maximum text length has been exceeded
The text I was trying to convert was about 5000 characters.
Is there any way (with or without the API) to convert long strings of text with Polly without having to cut them into million different pieces?
Any tip in the right direction will be appreciated,
Thanks
Speech Synthesis Markup Language (SSML) is a standardized markup language that enables developers to modify Text-to-Speech (TTS) audio. With SSML, you can control various vocal characteristics of TTS output, such as pronunciation, speech rate, and other elements, to produce a more natural-sounding voice experience.
The generated audio can be up to 6 hours long, and is typically ready within minutes. In addition to 100,000 characters of text, each request can include an additional 100,000 characters of Speech Synthesis Markup Language (SSML) markup.
Text-to-Speech on AWSAmazon Polly is an API-driven service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. It provides dozens of lifelike voices across a wide variety of languages.
The size of the input text can be up to 1500 billed characters (3000 total characters). SSML tags are not counted as billed characters.
http://docs.aws.amazon.com/polly/latest/dg/limits.html
The pricing examples seem to be intended to give a sense of the relatively low cost of voicing a large work, but the work would actually need to be divided into groups of sentences and submitted to the API, which is the only interface -- the SDKs and CLI call the same SynthesizeSpeech
API.
How to create long audio files is described in the docs: https://docs.aws.amazon.com/polly/latest/dg/longer-cli.html
An aws-CLI call might look like this:
aws polly start-speech-synthesis-task \
--region eu-central-1 \
--endpoint-url "https://polly.eu-central-1.amazonaws.com/" \
--output-format mp3 \
--output-s3-bucket-name your-bucket-name \
--output-s3-key-prefix optional/prefix/path/file \
--voice-id Hans \
--text-type ssml \
--text file://output.xml \
--speech-mark-types='["sentence", "word", "ssml"]' \
As you can see you will need an S3-bucket for (temporay) storage.
I have no special tip without breaking a text in pieces, but i wrote an article with the way to do it in NodeJS. If you don't have any other alternative, feel free to review and comment it !
How to handle more than 1500 characters with AWS Polly text-to-speech
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With