Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What audio file types does Google Cloud Speech API recognize?

I'm trying to use Google's Cloud Speech API. There's documentation and code examples here:

https://cloud.google.com/speech/docs/basics
https://cloud.google.com/speech/docs/rest-tutorial

I can get the sample code to run just fine if I point it to an included file, audio.raw, but not with a brief .wav file.

I have no idea what format the audio sample file is:

$ file audio.raw 
audio.raw: data

With my .wav file that has maybe 10 seconds of audio I get an empty result.

I'm aware of this answer.

google cloud speech api returning empty result

My question was asked before but there was not an answer to the question.

What types of audio are supported by Cloud Speech API?

I can't imagine that I would have to get the properties of the audio file just right to get this to work. I assume a common use case, mine, is that someone records a meeting, has no idea of the parameters of the recording and just wants a text file.

like image 267
Sol Avatar asked Oct 15 '16 14:10

Sol


People also ask

How many languages can Google Speech API recognize?

Aiming to make its voice typing technology more inclusive, Google has added 30 new languages to voice search — bringing the total number of languages supported by speech recognition via Gboard on Android to 119.

How does Google Cloud Speech API work?

A streaming Speech-to-Text API recognition call is designed for real-time capture and recognition of audio, within a bi-directional stream. Your application can send audio on the request stream, and receive interim and final recognition results on the response stream in real time.

Can Google transcribe audio files?

Google Docs Using voice typing, Google voice transcription can create text transcripts from audio. Like many of the other manual transcription tools, there will be errors so make sure to clean it up before using it.


1 Answers

EDIT May 2020: seems things improved and this answer is no longer correct: see new docs for details about supported formats (including WAV).


As of 2016 the WAVe format does not seem to be supported. These formats are documented as supported though:

  • LINEAR16 Uncompressed 16-bit signed little-endian samples. This is the only encoding that may be used by speech.asyncrecognize.
  • FLAC This is the recommended encoding for speech.syncrecognize and StreamingRecognize because it uses lossless compression; therefore recognition accuracy is not compromised by a lossy codec. Only 16-bit samples are supported. Not all fields in STREAMINFO are supported
  • MULAW 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law.
  • AMR Adaptive Multi-Rate Narrowband codec. sampleRate must be 8000 Hz.
  • AMR_WB Adaptive Multi-Rate Wideband codec. sampleRate must be 16000 Hz.

https://cloud.google.com/speech/reference/rest/v1beta1/RecognitionConfig#AudioEncoding

like image 198
Marcin Orlowski Avatar answered Nov 15 '22 15:11

Marcin Orlowski