Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dialogflow, detection intent from audio

I'm trying to send an audio file to dialogflow API for intent detection. I already have an agent working quite well but only with text. I'm trying to add the the audio feature but with no luck.

I'm using the example (Java) provided in this page:

https://cloud.google.com/dialogflow-enterprise/docs/detect-intent-audio#detect-intent-text-java

This is my code:

public  DetectIntentResponse detectIntentAudio(String projectId, byte [] bytes, String sessionId,
                                         String languageCode)
            throws Exception {


            // Set the session name using the sessionId (UUID) and projectID (my-project-id)
            SessionName session = SessionName.of(projectId, sessionId);
            System.out.println("Session Path: " + session.toString());

            // Note: hard coding audioEncoding and sampleRateHertz for simplicity.
            // Audio encoding of the audio content sent in the query request.
            AudioEncoding audioEncoding = AudioEncoding.AUDIO_ENCODING_LINEAR_16;
            int sampleRateHertz = 16000;

            // Instructs the speech recognizer how to process the audio content.
            InputAudioConfig inputAudioConfig = InputAudioConfig.newBuilder()
                    .setAudioEncoding(audioEncoding) // audioEncoding = AudioEncoding.AUDIO_ENCODING_LINEAR_16
                    .setLanguageCode(languageCode) // languageCode = "en-US"
                    .setSampleRateHertz(sampleRateHertz) // sampleRateHertz = 16000
                    .build();

            // Build the query with the InputAudioConfig
            QueryInput queryInput = QueryInput.newBuilder().setAudioConfig(inputAudioConfig).build();

            // Read the bytes from the audio file
            byte[] inputAudio = Files.readAllBytes(Paths.get("/home/rmg/Audio/book_a_room.wav"));

            byte[] encodedAudio = Base64.encodeBase64(inputAudio);
            // Build the DetectIntentRequest
            DetectIntentRequest request = DetectIntentRequest.newBuilder()
                    .setSession("projects/"+projectId+"/agent/sessions/" + sessionId)
                    .setQueryInput(queryInput)
                    .setInputAudio(ByteString.copyFrom(encodedAudio))
                    .build();

            // Performs the detect intent request
            DetectIntentResponse response = sessionsClient.detectIntent(request);

            // Display the query result
            QueryResult queryResult = response.getQueryResult();
            System.out.println("====================");
            System.out.format("Query Text: '%s'\n", queryResult.getQueryText());
            System.out.format("Detected Intent: %s (confidence: %f)\n",
                    queryResult.getIntent().getDisplayName(), queryResult.getIntentDetectionConfidence());
            System.out.format("Fulfillment Text: '%s'\n", queryResult.getFulfillmentText());

            return response;

    }

I have tried with several formats, wav (PCM 16 bits several sample rates) and FLAC, and also converting the bytes to base64 in two different ways as described here (by code or console):

https://dialogflow.com/docs/reference/text-to-speech

I have even tested with the .wav provided in this example creating a new intent in my agent called "book a room" with that training phrase. It works using text and audio from the dialogflow console but only works with text, not audio from my code... and I'm sending the same wav they provide! (code above)

I always receive the same response (QueryResult):

enter image description here

I need a clue or something, I'm totally stuck here. No logs, no errors in the response... but does not work.

Thanks

like image 638
Raúl Malpica Avatar asked Oct 29 '22 02:10

Raúl Malpica


1 Answers

I wrote to the dialogflow support and the replied my with a working piece of code. It is basically the same posted above, the only difference is the base64 encoding, it is not necessary to do that.

So I removed:

byte[] encodedAudio = Base64.encodeBase64(inputAudio);

(And used inputAudio directly)

Now It is working as expected...

like image 62
Raúl Malpica Avatar answered Dec 18 '22 13:12

Raúl Malpica