I'm trying to send an audio file to dialogflow API for intent detection. I already have an agent working quite well but only with text. I'm trying to add the the audio feature but with no luck.
I'm using the example (Java) provided in this page:
https://cloud.google.com/dialogflow-enterprise/docs/detect-intent-audio#detect-intent-text-java
This is my code:
public DetectIntentResponse detectIntentAudio(String projectId, byte [] bytes, String sessionId,
String languageCode)
throws Exception {
// Set the session name using the sessionId (UUID) and projectID (my-project-id)
SessionName session = SessionName.of(projectId, sessionId);
System.out.println("Session Path: " + session.toString());
// Note: hard coding audioEncoding and sampleRateHertz for simplicity.
// Audio encoding of the audio content sent in the query request.
AudioEncoding audioEncoding = AudioEncoding.AUDIO_ENCODING_LINEAR_16;
int sampleRateHertz = 16000;
// Instructs the speech recognizer how to process the audio content.
InputAudioConfig inputAudioConfig = InputAudioConfig.newBuilder()
.setAudioEncoding(audioEncoding) // audioEncoding = AudioEncoding.AUDIO_ENCODING_LINEAR_16
.setLanguageCode(languageCode) // languageCode = "en-US"
.setSampleRateHertz(sampleRateHertz) // sampleRateHertz = 16000
.build();
// Build the query with the InputAudioConfig
QueryInput queryInput = QueryInput.newBuilder().setAudioConfig(inputAudioConfig).build();
// Read the bytes from the audio file
byte[] inputAudio = Files.readAllBytes(Paths.get("/home/rmg/Audio/book_a_room.wav"));
byte[] encodedAudio = Base64.encodeBase64(inputAudio);
// Build the DetectIntentRequest
DetectIntentRequest request = DetectIntentRequest.newBuilder()
.setSession("projects/"+projectId+"/agent/sessions/" + sessionId)
.setQueryInput(queryInput)
.setInputAudio(ByteString.copyFrom(encodedAudio))
.build();
// Performs the detect intent request
DetectIntentResponse response = sessionsClient.detectIntent(request);
// Display the query result
QueryResult queryResult = response.getQueryResult();
System.out.println("====================");
System.out.format("Query Text: '%s'\n", queryResult.getQueryText());
System.out.format("Detected Intent: %s (confidence: %f)\n",
queryResult.getIntent().getDisplayName(), queryResult.getIntentDetectionConfidence());
System.out.format("Fulfillment Text: '%s'\n", queryResult.getFulfillmentText());
return response;
}
I have tried with several formats, wav (PCM 16 bits several sample rates) and FLAC, and also converting the bytes to base64 in two different ways as described here (by code or console):
https://dialogflow.com/docs/reference/text-to-speech
I have even tested with the .wav provided in this example creating a new intent in my agent called "book a room" with that training phrase. It works using text and audio from the dialogflow console but only works with text, not audio from my code... and I'm sending the same wav they provide! (code above)
I always receive the same response (QueryResult):
I need a clue or something, I'm totally stuck here. No logs, no errors in the response... but does not work.
Thanks
I wrote to the dialogflow support and the replied my with a working piece of code. It is basically the same posted above, the only difference is the base64 encoding, it is not necessary to do that.
So I removed:
byte[] encodedAudio = Base64.encodeBase64(inputAudio);
(And used inputAudio directly)
Now It is working as expected...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With