Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon Alexa - capture full transcript

I am building an Alexa Skill using AWS Lambda and NodeJS. I have two questions:

1) Is it possible for me to retrieve the full transcript of the speaker?

In my Alexa phone app, I'm able to read exactly what I've spoken, but I'd like to collect this data so I can possibly analyze how people are speaking to my Skill.

This is possible with Speech-to-text tools like Google Speech APIs (demo here, spec here), with things like recognition.onresult():

recognition.onresult = function(event) {
    var interim_transcript = '';

    for (var i = event.resultIndex; i < event.results.length; ++i) {
      if (event.results[i].isFinal) {
        final_transcript += event.results[i][0].transcript;

In my Alexa app, you can see here it captured when I asked "sing happy birthday":

enter image description here

How can I programmatically capture this? I'd like to know when a user asks for things that I haven't thought of, collect these failures and common speech requests, and improve the skill based on it.


2) Does Alexa support multiple voices and multiple languages (input and output)?

Again, looking at Google Speech APIs, you can see it allows for many modifications to Speech input and Speech output, with multi-languages, and even speech rate:

    var utterance = new SpeechSynthesisUtterance();
    utterance.rate = 0.7;
    utterance.lang = "zh-CN";

Does Alexa offer this suite of controls?

like image 867
user3871 Avatar asked Apr 12 '16 16:04

user3871


People also ask

Can you get a transcript of Alexa?

Open the Alexa app . Open More and select Settings. Select Alexa Privacy. Select Review Voice History and then select an entry, review a specific date range, or filter by device or voice ID.

Does Alexa record everything you say?

You need to know that Alexa is technically always listening, even without explicitly triggering an Alexa device. Alexa does not actively record and store all your conversations, but it's always listening for "Alexa," the wake word. Once you say it, anything you say that follows is recorded and stored in the cloud.

How do I extract recordings from Alexa?

Using the Alexa app, you can listen to Alexa recordings by going to the menu and selecting Settings > Alexa Privacy > Review Voice History. By default, you can review your voice history for the current day and select an entry for a closer look.


2 Answers

Question 1:

Not currently. According to the request syntax, the audio clip is not provided to your service's endpoint. Alternatively, if you were providing the hardware, and leveraging the Alexa Voice Service, then you would be capturing the Audio.

Question 2:

Not currently. Alexa seems to only support English

like image 147
BMW Avatar answered Oct 02 '22 13:10

BMW


To Capture Multiple Sentences:

Use this hack created by my colleague Bryan Colligan.

How it works

The hack uses slot type CONTENT_LIST with "value": "all" to capture any word. By creating sample utterances which include multiple capture all slots for example "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX}" you can capture sentences of varying length with relative ease.

Note: In my experience Amazon's "Search Query" is limited to 5-6 words.

Warning: Amazon's transcriptions are pretty bad, so don't be surprised if what you capture is somewhat unreadable. This shortcoming is likely one reason Amazon does not reveal its transcripts. Google is much further ahead in Voice to Text. I'm sure in the future Amazon will release the transcripts when they feel more comfortable with their technology.

The code

The following code will concatenate multiple slots. It can be placed in your lambda function.

let querySentance = '';
let wordSlots = ["WordI", "WordII", "WordIII", "WordIV", "WordV", "WordVI", "WordVII", "WordVIII", "WordIX", "WordX", "WordXI", "WordXII", "WordXIII", "WordXIV", "WordXV", "WordXVI", "WordXVII", "WordXVIII", "WordIXX", "WordXX", "WordXXI", "WordXXII", "WordXXIII", "WordXXIV", "WordXXV", "WordXXVI", "WordXXVII", "WordXXVIII", "WordIXXX", "WordXXX",];
wordSlots.forEach((word)=>{
    let slot = this.event.request.intent.slots[word];
    if (slot !== undefined && slot.value !== '' && slot.value !== '?' && slot.value !== null && slot.value !== undefined){
        querySentance = querySentance+' '+slot.value;
    }
});

The following Interaction Model uses CONTENT_LIST and "value": "all" to capture any word.

{
    "interactionModel": {
        "languageModel": {
            "invocationName": "alpha voice",
            "intents": [
                {
                    "name": "AMAZON.CancelIntent",
                    "samples": [
                        "cancel"
                    ]
                },
                {
                    "name": "AMAZON.HelpIntent",
                    "samples": [
                        "help"
                    ]
                },
                {
                    "name": "AMAZON.StopIntent",
                    "samples": [
                        "stop"
                    ]
                },
                {
                    "name": "OzIntent",
                    "slots": [
                        {
                            "name": "Query",
                            "type": "AMAZONSearchQuery"
                        },
                        {
                            "name": "WordI",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordIII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordIV",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordV",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordVI",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordVII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordVIII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordIX",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordX",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXI",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXIII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXIV",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXV",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXVI",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXVII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXVIII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordIXX",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXX",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXI",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXIII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXIV",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXV",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXVI",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXVII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXVIII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordIXXX",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXX",
                            "type": "CONTENT_LIST"
                        }
                    ],
                    "samples": [
                        "{WordI}",
                        "{WordI} {WordII}",
                        "{WordI} {WordII} {WordIII}",
                        "{WordI} {WordII} {WordIII} {WordIV}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV} {WordXXVI}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV} {WordXXVI} {WordXXVII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV} {WordXXVI} {WordXXVII} {WordXXVIII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV} {WordXXVI} {WordXXVII} {WordXXVIII} {WordIXXX}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV} {WordXXVI} {WordXXVII} {WordXXVIII} {WordIXXX} {WordXXX}"
                    ]
                },
                {
                    "name": "AMAZON.NavigateHomeIntent",
                    "samples": [
                        "navigate home"
                    ]
                }
            ],
            "types": [
                {
                    "name": "AMAZONSearchQuery",
                    "values": [
                        {
                            "name": {
                                "value": "all"
                            }
                        }
                    ]
                },
                {
                    "name": "CONTENT_LIST",
                    "values": [
                        {
                            "name": {
                                "value": "all"
                            }
                        }
                    ]
                }
            ]
        }
    }
}

Note: I use this code as a capture all for my skill. It's the only intent. If you're looking to have other intents so that this intent can detect utterances that fall through I'd recommend experimenting. Create an intent with defined utterances and see if Amazon will pick it before falling back on this free form capture.

Please comment below if you have success and I'll update the answer.

like image 23
Caleb Gates Avatar answered Oct 02 '22 13:10

Caleb Gates