Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google cloud speech syncrecognize "INVALID_ARGUMENT"

I have managed the "overview tutorial" : https://cloud.google.com/speech/docs/getting-started Then I tried to use my own audio file . I uploaded a .flac file with a sample rate of 16000Hz.

I only changed the sync-request.json file below with my own audio file hosted on google cloud storage (gs://my-bucket/test4.flac)

{
  "config": {
      "encoding":"flac",
      "sample_rate": 16000
  },
  "audio": {
      "uri":"gs://my-bucket/test4.flac"
  }
}

The file is well recognized but the request return an "INVALID_ARGUMENT" error

{
  "error": {
    "code": 400,
    "message": "Unable to recognize speech, code=-73541, possible error in recognition config. Please correct the config and retry the request.",
    "status": "INVALID_ARGUMENT"
  }
}
like image 606
Damien Romito Avatar asked Sep 21 '16 15:09

Damien Romito


2 Answers

As per this answer, all encodings support only 1 channel (mono) audio

I was creating the FLAC file with this command:

ffmpeg -i test.mp3 test.flac

Sample rate in request does not match FLAC header

But adding the -ac 1 (setting number of audio channels to 1) fixed this issue.

ffmpeg -i test.mp3 -ac 1 test.flac

Here is my full Node.js code

const Speech = require('@google-cloud/speech');
const projectId = 'EnterProjectIdGeneratedByGoogle';

const speechClient = Speech({
    projectId: projectId
});

// The name of the audio file to transcribe
var fileName = '/home/user/Documents/test/test.flac';


// The audio file's encoding and sample rate
const options = {
    encoding: 'FLAC',
    sampleRate: 44100
};

// Detects speech in the audio file
speechClient.recognize(fileName, options)
    .then((results) => {
        const transcription = results[0];
        console.log(`Transcription: ${transcription}`);
    }, function(err) {
        console.log(err);
    });

Sample rate could be 16000 or 44100 or other valid ones, and encoding can be FLAC or LINEAR16. Cloud Speech Docs

like image 137
Mahesh Avatar answered Sep 28 '22 07:09

Mahesh


My bad, as the doc "https://cloud.google.com/speech/docs/basics", the .flac file have to be a 16-bit PCM

Sumup:

Encoding: FLAC
Channels: 1 @ 16-bit
Samplerate: 16000Hz

/!\ pay attention to not export a stereo file (2 channels) file which throw an other error (only one channel accepted) Google speech API internal server error -83104

like image 32
Damien Romito Avatar answered Sep 28 '22 07:09

Damien Romito