Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it not possible to use curl, to use Google Cloud Speech API, to recognize within 10 to 15 minute files?

I'm using REST API with cURL because I need to do something quick and simple, and I'm on a box that I can't start dumping garbage on; i.e. some thick developer SDK.

I started out base64 encoding flac files and initiating speech.syncrecognize.

That eventually failed with:

{
  "error": {
    "code": 400,
    "message": "Request payload size exceeds the limit: 10485760.",
    "status": "INVALID_ARGUMENT"
  }
}

So okay, you can't send 31,284,578 bytes in the request; have to use Cloud Storage. So, I upload the flac audio file and try again using the file now in Cloud Storage. That fails with:

{
  "error": {
    "code": 400,
    "message": "For audio inputs longer than 1 min, use the 'AsyncRecognize' method.",
    "status": "INVALID_ARGUMENT"
  }
}

Great, speech.syncrecognize doesn't like the content size; try again with speech.asyncrecognize. That fails with:

{
  "error": {
    "code": 400,
    "message": "For audio inputs longer than 1 min, please use LINEAR16 encoding.",
    "status": "INVALID_ARGUMENT"
  }
}

Okay, so speech.asyncrecognize can only do LPCM; upload the file in pcm_s16le format and try again. So finally, I get an operation handel:

{
  "name": "9174269756763138681"
}

Keep checking it, and eventually it's complete:

{
  "name": "9174269756763138681",
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"
  }
}

So wait, after all that, with the result now sitting on a queue, there is no REST method to request the result? Someone please tell me that I've missed the glaringly obvious staring me right in the face, and that Google didn't create completely pointless, incomplete, REST API.

like image 497
tlum Avatar asked Oct 18 '22 04:10

tlum


1 Answers

So the answer to the question is, No, it is possible to use curl, to use Google Cloud Speech API, to recognize within 10 to 15 minute files... assuming you navigate and conform to a fairly tight set of constraints... at least in beta1.

What is not overtly obvious from the documentation is the result should be returned by the operations.get method... which would have been obvious had any of my attempts actually returned something other than empty results.

The source rate in my files is either 44,100 or 48,000 Hz, and I was setting sample_rate to the source native rate. However, contrary to the documentation which states:

Sample rate in Hertz of the audio data sent in all RecognitionAudio messages. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling).

after re-sampling to 16,000 Hz I started to get results with operations.get.

I think it's worth noting that correlation does not imply causation. After re-sampling to 16,000 Hz the files becomes significantly smaller. Thus, I can't prove it's a sample rate issue, and not just the service choking on files over a certain size.

It's also worth noting the documentation refers to the Sample Rate inconsistently. It appears that gRPC API may be expecting sample_rate, and REST API may be expecting sampleRate, according to their respective detailed definitions, in which case the Quickstart may be giving an incorrect example for the REST API.

like image 170
tlum Avatar answered Nov 02 '22 05:11

tlum