Is it possible to use amazon Polly in an Alexa Skill to deliver for instance an 2 language response in an translation or otherwise multilingual context? And if yes, who has experiences with using this service from a lambda function?
UPDATE: According to the helpful comment by Julian H this answer has been updated to reflect the latest changes to Polly. Polly is now able to be used without conversion with Alexa, and steps 7-9 below are no longer necessary.
I have updated the following steps to reflect the new process of using Polly w/ Alexa via lambda function based on the project alexa-meets-polly.
From alexa-meets-polly:
User speaks to an Alexa device and asks for e.g. "What is "Good Morning" in Polish?"
NLU of Alexa triggers the Translate-intent and passes in a language-slot with value Polish and a term-slot having the value Good Morning. An AWS Lambda function whose code is contained in this Repo implements a Speechlet that handles the request and returns the translation.
Before this skill uses the translation API and TTS service of Polly, it first looks into its own dictionary where all the previous translations are stored. If it finds a record for Good Morning in Polish in the database it will skip the entire round-trip (step 4 to 9) and uses the S3 audio-file referenced in the Dynamo record (learn how it got there in step 10.)
However, if Good Morning in Polish has never been translated before the skill requests Good Morning in Polish from Microsoft Translator API (or interchangeably from Google Translate).
The returned translation is then passed to AWS Polly. Polly responds with an MP3 bitstream with the spoken translation.
The stream is persisted in AWS S3 as an mp3-file.
7.-9. No custom conversion of Polly-mp3 necessary anymore as it's now aligned to Alexa requirements.
Finally, a record is created for Good Morning in Polish in the Dynamo dictionary. Another record that references the new dictionary entry is created for the user so Alexa keeps in mind the last translation. This is how a user can request Alexa to repeat the most recent translation.
The skill creates the output-speech text and squeezes in an audio-SSML tag with the mp3-url.
Output-speech is returned to the Alexa device. Alexa speaks and plays back the translated text with one of Polly's voices. A card is returned to the Alexa app providing the written translation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With