I am building an Alexa Skill using AWS Lambda and NodeJS. I have two questions:
1) Is it possible for me to retrieve the full transcript of the speaker?
In my Alexa phone app, I'm able to read exactly what I've spoken, but I'd like to collect this data so I can possibly analyze how people are speaking to my Skill.
This is possible with Speech-to-text tools like Google Speech APIs (demo here, spec here), with things like recognition.onresult()
:
recognition.onresult = function(event) {
var interim_transcript = '';
for (var i = event.resultIndex; i < event.results.length; ++i) {
if (event.results[i].isFinal) {
final_transcript += event.results[i][0].transcript;
In my Alexa app, you can see here it captured when I asked "sing happy birthday":
How can I programmatically capture this? I'd like to know when a user asks for things that I haven't thought of, collect these failures and common speech requests, and improve the skill based on it.
2) Does Alexa support multiple voices and multiple languages (input and output)?
Again, looking at Google Speech APIs, you can see it allows for many modifications to Speech input and Speech output, with multi-languages, and even speech rate:
var utterance = new SpeechSynthesisUtterance();
utterance.rate = 0.7;
utterance.lang = "zh-CN";
Does Alexa offer this suite of controls?
Open the Alexa app . Open More and select Settings. Select Alexa Privacy. Select Review Voice History and then select an entry, review a specific date range, or filter by device or voice ID.
You need to know that Alexa is technically always listening, even without explicitly triggering an Alexa device. Alexa does not actively record and store all your conversations, but it's always listening for "Alexa," the wake word. Once you say it, anything you say that follows is recorded and stored in the cloud.
Using the Alexa app, you can listen to Alexa recordings by going to the menu and selecting Settings > Alexa Privacy > Review Voice History. By default, you can review your voice history for the current day and select an entry for a closer look.
Question 1:
Not currently. According to the request syntax, the audio clip is not provided to your service's endpoint. Alternatively, if you were providing the hardware, and leveraging the Alexa Voice Service, then you would be capturing the Audio.
Question 2:
Not currently. Alexa seems to only support English
Use this hack created by my colleague Bryan Colligan.
The hack uses slot type CONTENT_LIST
with "value": "all"
to capture any word. By creating sample utterances which include multiple capture all slots for example "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX}"
you can capture sentences of varying length with relative ease.
Note: In my experience Amazon's "Search Query" is limited to 5-6 words.
Warning: Amazon's transcriptions are pretty bad, so don't be surprised if what you capture is somewhat unreadable. This shortcoming is likely one reason Amazon does not reveal its transcripts. Google is much further ahead in Voice to Text. I'm sure in the future Amazon will release the transcripts when they feel more comfortable with their technology.
The following code will concatenate multiple slots. It can be placed in your lambda function.
let querySentance = '';
let wordSlots = ["WordI", "WordII", "WordIII", "WordIV", "WordV", "WordVI", "WordVII", "WordVIII", "WordIX", "WordX", "WordXI", "WordXII", "WordXIII", "WordXIV", "WordXV", "WordXVI", "WordXVII", "WordXVIII", "WordIXX", "WordXX", "WordXXI", "WordXXII", "WordXXIII", "WordXXIV", "WordXXV", "WordXXVI", "WordXXVII", "WordXXVIII", "WordIXXX", "WordXXX",];
wordSlots.forEach((word)=>{
let slot = this.event.request.intent.slots[word];
if (slot !== undefined && slot.value !== '' && slot.value !== '?' && slot.value !== null && slot.value !== undefined){
querySentance = querySentance+' '+slot.value;
}
});
The following Interaction Model uses CONTENT_LIST
and "value": "all"
to capture any word.
{
"interactionModel": {
"languageModel": {
"invocationName": "alpha voice",
"intents": [
{
"name": "AMAZON.CancelIntent",
"samples": [
"cancel"
]
},
{
"name": "AMAZON.HelpIntent",
"samples": [
"help"
]
},
{
"name": "AMAZON.StopIntent",
"samples": [
"stop"
]
},
{
"name": "OzIntent",
"slots": [
{
"name": "Query",
"type": "AMAZONSearchQuery"
},
{
"name": "WordI",
"type": "CONTENT_LIST"
},
{
"name": "WordII",
"type": "CONTENT_LIST"
},
{
"name": "WordIII",
"type": "CONTENT_LIST"
},
{
"name": "WordIV",
"type": "CONTENT_LIST"
},
{
"name": "WordV",
"type": "CONTENT_LIST"
},
{
"name": "WordVI",
"type": "CONTENT_LIST"
},
{
"name": "WordVII",
"type": "CONTENT_LIST"
},
{
"name": "WordVIII",
"type": "CONTENT_LIST"
},
{
"name": "WordIX",
"type": "CONTENT_LIST"
},
{
"name": "WordX",
"type": "CONTENT_LIST"
},
{
"name": "WordXI",
"type": "CONTENT_LIST"
},
{
"name": "WordXII",
"type": "CONTENT_LIST"
},
{
"name": "WordXIII",
"type": "CONTENT_LIST"
},
{
"name": "WordXIV",
"type": "CONTENT_LIST"
},
{
"name": "WordXV",
"type": "CONTENT_LIST"
},
{
"name": "WordXVI",
"type": "CONTENT_LIST"
},
{
"name": "WordXVII",
"type": "CONTENT_LIST"
},
{
"name": "WordXVIII",
"type": "CONTENT_LIST"
},
{
"name": "WordIXX",
"type": "CONTENT_LIST"
},
{
"name": "WordXX",
"type": "CONTENT_LIST"
},
{
"name": "WordXXI",
"type": "CONTENT_LIST"
},
{
"name": "WordXXII",
"type": "CONTENT_LIST"
},
{
"name": "WordXXIII",
"type": "CONTENT_LIST"
},
{
"name": "WordXXIV",
"type": "CONTENT_LIST"
},
{
"name": "WordXXV",
"type": "CONTENT_LIST"
},
{
"name": "WordXXVI",
"type": "CONTENT_LIST"
},
{
"name": "WordXXVII",
"type": "CONTENT_LIST"
},
{
"name": "WordXXVIII",
"type": "CONTENT_LIST"
},
{
"name": "WordIXXX",
"type": "CONTENT_LIST"
},
{
"name": "WordXXX",
"type": "CONTENT_LIST"
}
],
"samples": [
"{WordI}",
"{WordI} {WordII}",
"{WordI} {WordII} {WordIII}",
"{WordI} {WordII} {WordIII} {WordIV}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV} {WordXXVI}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV} {WordXXVI} {WordXXVII}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV} {WordXXVI} {WordXXVII} {WordXXVIII}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV} {WordXXVI} {WordXXVII} {WordXXVIII} {WordIXXX}",
"{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV} {WordXXVI} {WordXXVII} {WordXXVIII} {WordIXXX} {WordXXX}"
]
},
{
"name": "AMAZON.NavigateHomeIntent",
"samples": [
"navigate home"
]
}
],
"types": [
{
"name": "AMAZONSearchQuery",
"values": [
{
"name": {
"value": "all"
}
}
]
},
{
"name": "CONTENT_LIST",
"values": [
{
"name": {
"value": "all"
}
}
]
}
]
}
}
}
Note: I use this code as a capture all for my skill. It's the only intent. If you're looking to have other intents so that this intent can detect utterances that fall through I'd recommend experimenting. Create an intent with defined utterances and see if Amazon will pick it before falling back on this free form capture.
Please comment below if you have success and I'll update the answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With